Header text

EssayTagger is a web-based tool to help teachers grade essays faster.
But it is not an auto-grader.

This blog will cover EssayTagger's latest feature updates as well as musings on
education, policy, innovation, and preserving teachers' sanity.

Thursday, September 13, 2012

On teacher accountability, pt1: The trouble with bad data

In Part 1 I lay out the case against teacher accountability measures via "value-added" analysis of standardized test score data. In part 2 I offer practical compromises.

Here in the Chicagoland area we are in the fourth day of the Chicago Teachers Union (CTU) strike that is making national headlines.

I did my Master of Education and teacher certification program at the University of Illinois-Chicago. Not surprisingly, a lot of my former classmates are current Chicago Public School (CPS) teachers. I spoke with them last night as they returned from a day out on the picket lines.

They made it clear that this was about fighting a flawed teacher evaluation system that puts undo emphasis on their students' standardized test scores. They also have serious concerns about the push to privatize the public school system. Then are the more tangible things they're fighting for like reduced class sizes (raise your hand if you think 38 teenagers in one room can be productive at anything).

The media and the average Joe on the street think this is about money or benefits or the teachers stubbornly refusing any form of accountability. This is incorrect.

Let's talk about accountability. It's important.
Accountability matters. Teachers should be held to high standards and should be judged by the quality of their work.

Understand that teachers aren't fighting accountability; they're fighting a particular form of accountability that is of dubious value and may indeed be deeply flawed.

Education reformers have latched onto "value-added analysis" as a way to gauge a teacher's impact on his or her students' progress on standardized tests. LA Unified School District and now New York City have even published these value-added analysis scores which rank the effectiveness of all of their teachers, supposedly as a means of improving transparency and, of course, accountability.

The theory: Good teachers will be identified. Bad teachers will be outed and kicked to the curb. Everyone will be under pressure to step up their game and produce better results. Students will learn more as a result. America wins.

But there's a "but"
This happy story only holds together when that teacher data is valid. And there are a lot of flaws with this data.

The NYC data had a margin of error of 35% for Math teachers.

Its margin of error for English teachers was 53%.

Think about that. With a 53% margin of error, the data is wrong more often than it is right. Your kid's teacher might shine in the top 80% of all teachers listed... or might in reality belong in the bottom 27%. That's what a 53% margin of error means.

A margin of error of 53% means that a COIN FLIP is a more accurate way to determine which teachers are good and which ones are bad.

Other analyses have shown that if you swap in a different test, you can get wildly different results:
40 percent of the teachers who scored in the bottom quartile based on their students’ state standardized test scores actually placed in the top half of teachers when an alternative assessment was used.
If the analysis was truly valid, it would not be subject to test-to-test variances of this degree. The method also fails to provide stable data for teachers' year-to-year performance. Linda Darling-Hammond, et al, note:
A study examining data from five school districts found, for example, that of teachers who scored in the bottom 20% of rankings in one year, only 20% to 30% had similar ratings the next year, while 25% to 45% of these teachers moved to the top part of the distribution, scoring well above average. The same was true for those who scored at the top of the distribution in one year: A small minority stayed in the same rating band the following year, while most scores moved to other parts of the distribution.
Darling-Hammond, et al, also detail many other confounding factors that distort and invalidate value-added analysis data. If you really want your mind blown, read the second paragraph of their section labeled: "2. Teachers’ value-added performance is affected by the students assigned to them."

So where does this leave us?
CPS' current evaluation proposal uses this sort of value-added data as 25% of its teacher evaluation system, rising to 40% over the next five years. This is way too significant a portion of the evaluation system, given the method's flaws.

Accountability makes sense, but only if a sound, accurate measure is used.

In Part 2 I'll offer some practical compromises that might be a way around some of the thorniest issues.