In the midst of a controversial LA Times series linking teacher performance in that district to test scores, a new briefing paper was released today by the Economic Policy Institute cautioning against the use of test scores, the Value Added Model, to judge teacher performance.
The 27-page paper — by a blue ribbon collection of educaion researchers including Eva L. Baker, Paul E. Barton, Linda Darling-Hammond, Edward Haertel, Helen F. Ladd , Robe rt L. Linn, Diane Ravitch, Richard Rothstein, Richard J. Shavelson, and Lorrie A. Shepard - lists many negative impacts from judging teachers largely on student test scores. They also point to studies that cite the unreliability of scores.
My first response to this paper is wonder if there is any school system with the time, resources or staffing to conduct the thoughtful and deeper evaluations that these researchers recommend. The comprehensive evaluation model they suggest could be applied in many professions, except that it calls for resources and time that I don’t think too many companies have any longer. And I certainly don’t think schools do in this current bleak climate that will likely persist for a few more years.
Here are key passages, but please read the entire paper. I think many of you will be applauding its position. Also, I have added the link to the LA Times investigation, which is a powerful piece of journalism and creating quite a stir. So, forget about the wash and the grocery store, read both of these and let us know what you think:
A review of the technical evidence leads us to conclude that, although standardized test scores of students are one piece of information for school leaders to use to make judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation. Some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing tests of basic skills in math and reading. Based on the evidence, we consider this unwise.
Any sound evaluation will necessarily involve a balancing of many factors that provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning.
For a variety of reasons, analyses of VAM results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers. VAM estimates have proven to be unstable across statistical models, years, and classes that teachers teach. One study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. Another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year.
Thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year. The same dramatic fluctuations were found for teachers ranked at the bottom in the first year of analysis. This runs counter to most people’s notions that the true quality of a teacher is likely to change very little over time and raises questions about whether what is measured is largely a “teacher effect” or the effect of a wide variety of other factors.
A study designed to test this question used VAM methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM results are based on factors other than teachers’ actual effectiveness.
VAM’s instability can result from differences in the characteristics of students assigned to particular teachers in a particular year, from small samples of students (made even less representative in schools serving disadvantaged students by high rates of student mobility), from other influences on student learning both inside and outside school, and from tests that are poorly lined up with the curriculum teachers are expected to cover, or that do not measure the full range of achievement of students in the class.
The paper concludes:
Although some advocates argue that admittedly flawed value-added measures are preferred to existing cumbersome measures for identifying, remediating, or dismissing ineffective teachers, this argument creates a false dichotomy. It implies there are only two options for evaluating teachers—the ineffectual current system or the deeply flawed test-based system. Yet there are many alternatives that should be the subject of experiments. The Department of Education should actively encourage states to experiment with a range of approaches that differ in the ways in which they evaluate teacher practice and examine teachers’ contributions to student learning. These experiments should all be fully evaluated.
There is no perfect way to evaluate teachers. However, progress has been made over the last two decades in developing standards-based evaluations of teaching practice, and research has found that the use of such evaluations by some districts has not only provided more useful evidence about teaching practice, but has also been associated with student achievement gains and has helped teachers improve their practice and effectiveness.
Structured performance assessments of teachers like those offered by the National Board for Professional Teaching Standards and the beginning teacher assessment systems in Connecticut and California have also been found to predict teacher’s effectiveness on value-added measures and to support teacher learning. These systems for observing teachers’ classroom practice are based on professional teaching standards grounded in research on teaching and learning. They use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work. Quite often, these approaches incorporate several ways of looking at student learning over time in relation to the teacher’s instruction.
Evaluation by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems, with a supplemental role played by multiple measures of student learning gains that, where appropriate, should include test scores. Given the importance of teachers’ collective efforts to improve overall student achievement in a school, an additional component of documenting practice and outcomes should focus on the effectiveness of teacher participation in teams and the contributions they make to school-wide improvement, through work in curriculum development, sharing practices and materials, peer coaching and reciprocal observation, and collegial work with students.