I recently sat down with Thomas B. Fordham Institute’s Michael Petrilli to discuss six great education heroes mentioned in my book, “Saving Schools.” You can listen to the interview here.
Confession: I love Richard Wagner’s operas and have seen his famous ‘Ring’ cycle three times, the third time a week ago in San Francisco. If you have seen or read Lord of the Rings, you sort of know the story line: Gods, giants, Rhinemaidens, dragons, dwarfs, Valhalla, and all that.
It is now fashionable for directors to forget any ideas Wagner himself had about the production and invent their own interpretation of the story. Nowhere is that kind of thing done more brazenly than in San Francisco. Our gods were New York hedge-fund investors, our dragon was a machine, and the beautiful, flowing Rhine river, where maidens guarded the Ring, became a cesspool. As the New York Times reviewer, Anthony Tommasini (NYT, July 5 2011) generously put it, the director should have put all of her ideas on a single page and then cut a third of them.
Twelve hours into the opera, distraught, I let my mind wander. Would it be possible to get some opera company— perhaps students at some adventurous school for the performing arts—to do a school-reform Ring cycle? The story might unfold at a New Jersey school board meeting, and the characters would be school board members, union leaders, ed school professors and so forth.
I envision school board president Wotan and his fractious board member, Fricka, fighting over the child, Freia (the beautiful girl with the youth-preserving golden apples that she brings to school each day), who has been doomed by the collective bargaining contract signed with the Teacher Union Giants, AFT Fafner and NEA Fasolt, to have four years in classrooms managed by dreadful teachers–who would have been fired were it not for that contract.
Meanwhile, Fafner and Fasolt, in exchange for beautiful Freia, have built the palatial school administration building, Valhalla. Duly elected with union money, Wotan and the rest of the board will enter the palace if only they can free Freia from her contractual fate.
To the Nederland Wotan goes to meet Chief Dwarf Professor Alberich, the ed school guru, who has the Ring that he had stolen from frolicking, book-reading, math-computing, Rhinemaiden school children by promising that he would never love another child nor create a school where kids could learn.
Empowered by the Ring, he rules the education world with his brilliant lectures—and pockets the gold received on the lecture circuit (where shriek his minions, the Niebelungen). When Wotan tricks Professor Alberich into turning himself into a toad. Wotan captures the ring. The Professor curses all who shall ever touch that Ring.
Wotan ignores the curse but is forced to give the Ring and the gold to the Union Giants. It goes to pay union membership dues and pension fund contributions under the control of the Union Giants in exchange for Freia, who escapes her fate by taking her golden apples with her to classes with effective teachers. No sooner do the Union bosses get the Ring than they fight over it. Fasolt kills his buddy and turns himself into a dragon where he hoards the gold.
The education world would be altogether lost were it not for the series of unlikely accidents that produce Siegfried Duncan and Brunnhilde Rhee, strong-willed, valiant education reformers. Siegfried kills the Union Dragon and retrieves the Ring, but is then killed by Professor Alberich’s son, Policy Analyst Hagen. Yet Brunnhilde finds a way to rescue the Ring, then sacrifice her life to return it to the Rhinemaidens, bringing the era of the School Board Gods to an end. Analyst Hagen drowns himself in a futile effort to grab the Ring from the children swimming in the water, now happily learning their lessons.
Sorry, Wagner, for the happy ending, but this production is for young people.
Michelle Rhee’s public popularity has shifted upward within the District of Columbia, pollsters tell us, but the elites who chair the committee set up by the National Research Council (NRC) of the National Academy of Science to assess Rhee’s chancellorship are holding firm to their anti-Rhee convictions, no matter what the evidence.
In my recent Education Next essay, I identified the biases and inaccuracies in their report on Rhee’s chancellorship. Specifically, I pointed out that gains on the National Assessment of Educational Progress under Rhee’s tenure were much larger than average gains for the other ten urban school districts participating in the assessment in 8th grade math and in 4th grade reading and math. (I also reported that 8th grade reading results are less favorable for the District of Columbia.)
In their reply to my critique, co-chairs Robert Hauser and Christopher Edley say that D. C. gains were “reliably higher than that of only two districts (Austin and Cleveland) in grade 4 mathematics and one district (Cleveland) in grade 4 reading—but no others. This finding has no particular ideological or political bent; it is the result of careful and straightforward analysis of data.”
Methinks thou dost protest too much, Messrs. Hauser and Edley. You are both good enough statisticians to know that averages across multiple cases (which I reported) are more informative than case-by-case comparisons (which you rely upon), each of which is noisy and therefore more likely to show no statistically significant difference.
In other words, average performance across ten cities provides a more precise and therefore more informative estimate of the underlying truth of the matter than do noisy individual city-by-city comparisons.
To knowingly elevate noisy data over more precise information is to pursue a “particular ideological or political bent.”
What is disturbing about all this is Robert Hauser’s new role as head of NRC’s education division. When the leader of a key division at NRC is engaged in promoting his own ideological agenda, all kinds of misleading reports can emerge from an allegedly scientific agency dependent upon government funding. Members of Congress should take special notice of this fact when asked to fund still another NRC study.
The evidence for all this comes not just from the DC report discussed here. For more on what is going on at the NRC, take a look at the recent report on school accountability, which Eric Hanushek has subjected to withering criticism.
Recently, Education Next released a path-breaking, peer-reviewed study by Ludger Woessmann which estimated long-term impacts of merit pay arrangements for teachers on student performance in math, science and reading at age 15. Using international data from the Programme for International Student Assessment (PISA). The study has the great advantage of providing an estimate of long-term impacts of merit pay that cannot be identified by looking at the impact of policy innovations after two or three years. However, the study is necessarily limited by the fact that it is based on observations from only the countries for which relevant data is available.
Even though Woessmann’s innovative study was executed with great care and sophistication— and a version is now available in the Economics of Education Review—a group which calls itself the National Education Policy Center—a group which receives substantial funding from teacher unions—has persuaded a reviewer to write a misleading critique of the paper. Such critiques are standard practice for the NEPC. It critically reviews many studies, no matter how well executed, if the findings from that study do not lend support to positions the unions have taken. Fortunately, Woessmann has agreed to take the time to reply to a review more disingenuous than thoughtful. His response is highly technical, but for those interested in the methodological specifics, it is worth a careful read.
Ludger Woessmann replies:
The NEPC review. which makes a number of critical and partly strident claims about my paper, “Cross-Country Evidence on Teacher Performance Pay,” is a perfect example of a case where there is a lot of new and correct material in the text – but alas, what is correct is not new and what is new is not correct. Let’s start with the “not so new” statements.
The reviewer states: “The primary claim of this Harvard Program on Education Policy and Governance report and the abridged Education Next version is that nations ‘that pay teachers on their performance score higher on PISA tests.’ After statistically controlling for several variables, the author concludes that nations with some form of merit pay system have, on average, higher reading and math scores on this international test of 15-year-old students.” This is not a “claim,” but simply a factual statement of a descriptive fact. Not even the reviewer can deny that.
The bottom-line criticism of the reviewer is that “drawing policy conclusions about teacher performance pay on the basis of this analysis is not warranted.” That statement is hardly new. Compare it to my own conclusion in my abridged version in Education Next: “Although these are impressive results, before drawing strong policy conclusions it is important to confirm the results through experimental or quasi-experimental studies carried out in advanced industrialized countries.” Where’s the substantive difference that would justify the strident review?
Next, the reviewer states repeatedly that “attributing causality is problematic” in such a study based on observational data. Right – this is exactly what my paper states very clearly a number of times, and addresses with a number of additional analyses. Even in the abridged version of the study, I take substantial care to highlight the cautions that remain with the study. It is seriously misleading for a reviewer to repeat the caveats highlighted in the study itself.
Additional limitations of the analysis highlighted in the paper and simply repeated by the reviewer are that the number of country observations is limited to 28 OECD countries and that the available measure of teacher performance pay is imperfect. In particular, the measure does not distinguish different forms and intensities of the performance-pay scheme. The value added by such repetition is unclear to me. However, what is ignored by the reviewer – and what starts to bridge the case from “not so new” to “not so correct” – is that all these factors play against the findings of the paper. They limit statistical power and possibly bias the coefficient estimate downwards – and, in this sense, make the finding only stronger.
Now for the directly “not correct” statements. The review claims that dropping a single country can overturn the results. This is not correct. As stated in the study, qualitative results are robust to dropping any individual country, as well as to dropping obvious groups of countries. (Of course, the point estimates vary somewhat, albeit not in a statistically significant way – what else should be expected?) The review also claims that the “geographical distance between countries, or clusters of countries,” may drive the results. But the study reports specifications with continental fixed effects and specifications that drop different clusters of countries, both of which speak against this being a “serious concern.”
The press release for the review (although not the review itself) claims that “The data are analyzed at the country level.” In fact, all regressions are preformed at the level of over 180,000 students, controlling for a large set of background factors at the student and school level. The information on the possibility of performance pay, though, is at the system level.
The press release also highlights the point raised above about heterogeneity in the performance-pay schemes by stating that “Perhaps one type of approach is beneficial, while another is detrimental.” Right – but the whole point is that on average they are positively related to achievement.
The method used in the paper – clustering-robust linear regressions – may not be well known to the reviewer, but – contrary to the reviewer’s claim – it does in fact take care of the hierarchical structure of the error terms. Monte Carlo analyses have even shown that they do so in a way that is usually more robust than the methods suggested by the reviewer (multilevel modeling).
The reviewer wrongly claims that my “report concludes … that the threat of omitted variable bias is .. proven to be negligible.” I am not aware how any empirical study could prove such a thing – “proving” that omitted variable bias is negligible is clearly scientific nonsense.
Learning from International Data
The bottom line is whether, despite the caveats that my study itself mentions, we can learn anything from the cross-country analysis. Of course we can. The paper presents new empirical evidence that complements existing studies on performance pay, not least because the cross-country design goes some way to capture general-equilibrium effects of teacher sorting that have eluded existing experimental studies. Some evidence, combined with extensive robustness checks, is clearly better than no evidence, also as a basis for policy discussion.
The reviewer did not present a single attempt to test whether his claims have any validity. By contrast, the reviewed study has put clear evidence on the table, and shown that it is robust to a forceful set of validity checks (the more demanding of which are not even discussed in the review). It is up to the reader which approach – the one of the original study or the one of the reviewer – is more convincing. But it even seems that the reviewer, despite the strident language contained in the press release that summarizes his analysis, in the end agrees with my assessment: “The study presented in the Harvard Program on Education and Governance report, and abridged in Education Next, is a step in the right direction.”
Adjusting for National Trends
Ginsburg: “Crucial to Peterson’s claims is that the DC score improvement should be computed only as the excess above the national average NAEP gain….This criticism makes little sense.”
Reply: Should we adjust for national trends when assessing how well a particular district is doing over a specific period of time? In Ginsburg’s view, the nation is too heterogeneous to be commonly affected by the financial disaster of 2007 or the impact of NCLB, or some other broad, national trend. Generally speaking, trends within districts more often parallel national trends than diverge from them, and it is for that reason that the adjustment I made is routinely undertaken when estimating impacts.
But in this case there is a more specific reason for adjusting for national trends—the variability in NAEP’s own test. Although NAEP attempts to standardize its test from one administration to the next, its efforts in this regard are more strenuous when administering the Long-Term Trend version of NAEP (LTT) than when administering the main NAEP (MAIN), upon which Ginsburg depends for his conclusions.
MAIN measures student performance in grades 4 and 8, while LTT measures student performance at ages 9, 13 and 17. It is the preferred measure for estimating trends, because age-dependent developmental factors affect student performance and one cannot be sure that the ages of students in 4th and 8th grade remain constant over time.
None of this would matter much, were it not for the fact that the two tests have been yielding divergent results. As Brookings scholar Tom Loveless has pointed out to me, students in 4th grade are making spectacular gains on MAIN, but those gains have not been duplicated on the LTT. Between 1990 and 2009, 4th graders gained 27 points on the MAIN math test (just 7 points less than the size of the gap between DC and U. S. performance in 2000), but they gained only 13 points on the LTT test. For 8th graders, the gains in math were 21 points on the MAIN but only 11 points on the LTT. We don’t know if the difficulty is that MAIN was simplified during this period, or whether LTT was made more challenging (though, as I said, the LTT test is designed especially to get a stable measure over time). But any look at trends over time needs to adjust for likely variation in the design and administration of MAIN.
The best available solution is to examine the extent to which changes in district performance close the gap between the district and the nation, as I have done. Ginsburg argues against such an adjustment, saying it “makes no sense” but, then, within the same paragraph, engages in an analysis similar to what I have recommended, saying, to wit: “For math…DC gains at grade 4 were higher than any state” over the full 2000-07 period.
Such comparisons need to be carried out, not anecdotally, as in Ginsburg’s comment, but systematically, as I have done, by looking at the extent to which DC closed the gap between its performance and that of the nation.
Ginsburg says he did not fail to adjust for the fact that “Rhee was in office for only two years.”
Reply: It is true that Ginsburg’s tables report annual gains, but his summary material does not. On page 8 of his paper, in the first statement of the findings in the main text of the paper, under the heading “overall results,” one encounters the following words:
“For math between 2009-09, Vance accounted for 46%, of the share of the total gain in NAEP scores for both grades 4 and 8, Janey 30%, and Rhee 24%…..For reading between 2003-09, Janey accounted for 65% of the total gain in NAEP scores over grades 4 and 8, and Rhee accounted for 35%.”
Beyond any shadow of a doubt, those prominent statements in his report constitute a misleading comparison that fails to adjust for the fact that “Rhee was in office for only two years.
Ginsburg defends the accuracy of his data as follows: “My report clearly specifies that I used the state NAEP series because of its consistent treatment of charter schools over the full 2000-2009 period.”
Reply: Ginsburg’s “consistent treatment of charter schools” is to include the students attending them in his assessment of Rhee’s performance without informing his readers of that fact. This is no trivial matter, as nearly a third of the DC students are attending charter schools, which operate autonomously of the district.
In my essay, I did my best to put to one side data for those students who were attending charter schools not authorized by the district. Another way to proceed is to remove all students attending any charter schools in the District of Columbia (no matter what entity is the authorizer). That solution, I have now learned, can be followed for the years 2003 to 2009 in math and 2005 to 2009 in reading.
The chart below displays results when all students attending charter schools are excluded from the analysis for all years for which information on charter schools is available. The chart shows the extent to which students closed the District-National performance gap annually during the years when the District was under Rhee’s Chancellorship and that of her predecessors. As can be seen, students did better under Rhee’s reign in both 4th and 8th grade reading and math.
Ginsburg now seems prepared to agree with me that the case against Rhee has yet to be established. He says that “longitudinal tracking of students is essential to estimating DC gains.” Inasmuch as Ginsburg never had the data to do the “longitudinal tracking” that he now admits is “essential,” he, in essence, has retracted his original claims.