I joined a conference call today with researcher Marcus A. Winters about his new study, “Transforming Tenure: Using Value-Added Modeling to Identify Ineffective Teachers.”
In the study released a few hours ago, Winters examined one of the most controversial approaches to teacher evaluations: Using student test scores to identify how much an individual teacher contributes to a student’s progress over the years.
Known as the value-added model or VAM, this approach appeals to lawmakers. However, educators argue that it’s not reliable because it ignores the many variables involved in a classroom.
A senior fellow at the Manhattan Institute and an assistant professor at the University of Colorado in Colorado Springs, Winters examined teacher data and VAM scores in Florida and found that a value-added model did predict which teachers were effective in future years in raising student achievement, but cautioned that the model should not be used in isolation to determine a teacher’s fate.
However, even used in isolation with no other assessment of a teacher’s performance, Winters found VAM was a better predictor of later teacher effectiveness than current evaluation methods.
“Everyone knows that teachers matter,” said Winters on the media call. “We know from empirical research that teachers are the most important school-based factor for producing student achievement. We also know teacher quality varies considerably. Some teachers are really great and some teachers aren’t very good at all.”
Now, tenure is awarded to both, largely because of flaws in teacher evaluation systems that virtually lead to all teachers being rated as effective, he said. His study asked: Is value-added a meaningful measure of a teacher’s performance and is it helpful in identifying future effective teachers?
Yes, according to his findings. Value-added ratings of teachers told more about a teacher’s later classroom effectiveness than the current proxy that many states rely on, whether a teacher holds a master’s degree, said Winters.
In fact, had Florida relied solely on VAM scores of its teachers to decide whether to keep or fire them after their first three years in the classroom, the state would have removed many teachers whose later VAM scores were low, said Winters.
Winters acknowledged that the value-added model is imperfect, but said current assessment tools that pronounce almost all teachers satisfactory are also imperfect. The difference, he said, is that the current imperfect system defaults in favor of teachers, while the value-added model — with its slight danger of mislabeling a few average teachers as ineffective – defaults in the interest of students.
Because I know that this issue is of great interest to teacher/readers here, let me pull out what Winters’ report says about it:
Critics of VAM analysis rightly point out that, as a statistical tool, VAM must contend with measurement error—the inevitable fact that measurements of the same thing, taken at different times, will vary, and some of this variation will be essentially random. VAM-based measures of teacher performance can be quite imprecise. When VAM is used to inform tenure decisions, it is likely that some average and even above-average teachers could be removed from the classroom because of a low VAM score caused by random variation in measurement over the years, rather than their own failures. The influence of measurement error can be mitigated by statistical adjustments and by incorporating multiple years of student performance when evaluating any particular teacher. But measurement error cannot be eliminated.
From the perspective of teachers (and their unions), the collateral damage of even a single teacher losing tenure from an inaccurately low VAM score is unacceptable. However, the issue is not as cut-and-dried from the perspective of the student. A tenure-reform policy based on VAM will be an improvement for students if it removes enough low-performing teachers to improve overall teacher quality in a school district. If student achievement is our most pressing concern, we need to consider the possible consequences of VAM-based policies on whole districts, even as we acknowledge the potential for error in individual cases.
No evaluation system creates a perfect measure of an employee’s productivity. VAM, then, should not be judged against a nonexistent ideal but rather evaluated for its potential to improve on the current system’s ability to predict future performance.
Now, back to the call:
“It is imperfect,” Winters said. “The question, though, is ‘Can it improve upon our ability to identify teachers who, in future years, are going to be effective in the classroom?’”
As policy, Winters would not rely solely on value-added scores to determine a teacher’s fate, calling for a multiple assessment that includes an observational-based judgement by a principal using agreed-upon rubrics of what effective teaching looks like in the classroom.
Thus, when a teacher earns low scores in both the subjective, the classroom observation, and the objective, the value-added model, Winter said school districts can be reassured that the teacher is ineffective. When the two assessments disagree, Winter said the system should consider it a red flag and look deeper. He also said that value-added scores should be considered over several years, so a teacher is not punished for one bad year.
Here is the official conclusion of the report, which can be read here:
Like previous research found in North Carolina, my analysis of Florida data found that pretenure VAM scores often provide information about a teacher’s future quality. Thus, VAM analysis can help replace “automatic” tenure with employment decisions based on reliable evaluations. It can be part of tenure reform and thus can contribute to improving public education in the United States.
But which tenure-reform policies would make best use of this technique? I addressed this question by pinpointing the teachers in the Florida data who would have been removed from the classroom according to several different types of policies and performance standards. I found that any VAM-based policy would have removed teachers who, on average, performed worse than their peers later in their careers.
However, different versions of VAM-based policies proved to have different consequences. Specifically, certain versions increased the risk that effective teachers (as measured by VAM) would be removed. For example, a policy could target teachers for removal if they have two or more periods of consecutive poor performance. Alternately, the policy could simply score teachers on an average of their performance ratings for a given number of years. I found that the latter policy was more likely than the former to result in the removal of effective teachers (teachers who, despite a “bad patch” in the records, would prove to be effective later). Another way to increase this risk of “false positives,” I found, was to set the performance bar high. Such policies, applied to the Florida data, would also have resulted in the removal of teachers who would later demonstrate effective performance.
These results tell tenure reformers that they should consider the number and type of teachers likely to be denied tenure or removed from the classroom under their proposed policies. This will help them design policies that balance the interests of students in need of great teachers and the legitimate interests of teachers concerned that they will be inappropriately removed from the classroom because of a randomly low VAM score.
The need for well-designed policies should not obscure the finding that public schools can indeed use VAM to help identify teachers for tenure or removal. Instead, these results underscore the importance of blending VAM with sound policies. This report does not argue that VAM should be used in isolation to evaluate teachers for tenure or to make any other employment decisions. VAM, as we have seen, is subject to random measurement errors, and so must be combined with other methods of teacher evaluation. The lesson of this report and of other research is that VAM can be a useful piece of a comprehensive evaluation system. Claims that it is unreliable should be rejected. VAM, when combined with other evaluation methods and well-designed policies, can and should be part of a reformed system that improves teacher quality and thus gives America’s public school pupils a better start in life.
–From Maureen Downey, for the AJC Get Schooled blog