New Research on Components of Teacher Evaluations
Two interesting studies were released in the past couple of weeks that look at implementation of teacher evaluation models and expose a few challenges that districts and states will need to address as they refine these new systems.
The first study, by researchers Morgan Polikoff and Andrew Porter, examines the use of value-added measures (VAM) in teacher evaluations and its relationship to instruction. The study is interesting in that it finds a disconnect between what students are being assessed on and what teachers are actually teaching in the classroom.
If we expect students to demonstrate their learning of standards on standardized assessments, then we would want to know that teachers are actually teaching those standards. But Polikoff and Porter find that what teachers are teaching isn’t highly aligned to what we’re expecting kids to learn, and as a result VAM is not necessarily a good indicator of how well a teacher is teaching content.
But I’m not sure that this is what we need VAM to tell us. VAM is used to estimate a teacher’s impact on her students’ growth in learning the state standards. And for that purpose, it’s pretty reliable. So if one of the things we’re looking for when we evaluate teachers is whether or not her students demonstrate higher achievement on specific standards over the course of the year, VAM can help with that.
What VAM doesn’t do is tell us what’s actually happening in the classroom – what sort of instructional practice is taking place. For that, we need other measures, like classroom observations.
Classroom observations have been around for a while; they are designed to help us understand how teachers deliver instruction and the quality of the strategies they use to meet student needs and promote learning. With all of the focus on improving teacher evaluations, folks are starting to take a closer look at how observations are conducted as well.
Enter the second report, this one from the Brown Center on Education Policy at the Brookings Institution. This paper reveals that biases among principals may be reflected in classroom observation ratings of teachers. Specifically, teachers whose students had higher incoming achievement levels were more likely to receive a higher rating than teachers whose students had lower incoming achievement levels.
The fix for this is pretty straightforward. Districts could adjust for student demographics when using observation scores (something VAM is already capable of doing) and they could better train principals and observers on how to conduct and score observations.
Taken together, these studies confirm a key principle of educator evaluations: schools and districts must use multiple measures to evaluate a teacher. While no single measure is perfect or gives us the whole picture, each measure tells us something different and can provide support for where another is weak.
What’s key is that we understand what each measure is supposed to tell us and how they’re supposed to work together. These studies help with that. With each report, researchers are finding more pieces of the puzzle of how to measure great teaching and how to assemble it into a picture that makes sense.
What concerns me is that some people are taking these studies to mean that districts or states should give up on improving their evaluation systems. Even Michael Petrilli, who is no stranger to the need for patience when implementing complex reforms, said that we “need to get back to common sense – trusting principals to make judgments.” (But isn’t that what the Brookings study is shows is a bit flawed?)
Of course we should empower school leaders to make personnel decisions based on what’s best for their schools and the teams they’re trying to build. But principals can’t do that without good data and evidence to inform their decisions. Meaningful evaluations fill that need.
So how should states and districts move forward?
They have a clear roadmap for improving evaluations, starting with better training for principals on how to use the components of these new evaluation instruments – stronger observational rubrics, more robust data – and how to interpret the results.
The Polikoff/Porter paper also demonstrates that to get more out of VAM, states must improve their assessments and ensure that what’s being taught in the classroom is what’s being measured on the test.
But districts and states can’t wait for these improvements to happen before they implement new evaluations. Teachers need meaningful feedback now; principals must be able to identify whether instruction is aligned to standards – and if it is not, help their teachers to improve.
Moreover, districts must hold principals accountable for doing the hard work of managing and developing their instructional teams. School leaders need to conduct evaluations in a thoughtful way and follow the evidence. Teachers whose students are far behind shouldn’t be penalized.
Every school must focus on growth in learning. In each classroom and across the school, is it happening or not? How is each member of the instructional team contributing to that effort? Without meaningful evaluations of teacher practice, it is very hard to answer these questions.
So rather than running away from the hard work of developing and implementing meaningful, multiple-measure teacher evaluation systems, school, district, and state leaders should look for the opportunities to make them better. These two studies can help point these leaders in the right direction.