Three major changes to New York City’s teacher evaluation system this year — more frequent and often unannounced observations, the shared definition of good and not-so-good teaching established by the Danielson rubric, and the shift from a two-point “satisfactory/unsatisfactory” to a four-point scale — have real potential to improve teaching and learning. But implementing all three at once, with high stakes attached, has definitely raised people’s blood pressure. Supervisors are on a steep learning curve mastering the rubric, getting into classrooms more frequently, and evaluating teachers on the new scale. Teachers scoring at level 2, or “Developing,” are getting a bit of a jolt, since for years they were deemed “Satisfactory.” And then there’s the MOSL process for using student achievement as part of evaluation.

All this anxiety might be considered a normal, transient part of the change process, but as I’ve coached New York City principals this year, I’ve become increasingly concerned about the way the rubric is being used. Charlotte Danielson originally designed her framework as a coaching tool, and she is a passionate advocate for a supportive, developmental approach to teacher supervision and evaluation. As her rubric has morphed into a high-stakes evaluation instrument in districts around the country, policymakers and union officials have made a series of decisions with very little research to guide them. The New York decision I’m most concerned about is requiring principals to rubric-score teachers after each classroom visit – not on all the elements, but on as many as possible. I see eight problems with this approach:

First, thinking in terms of rubric scoring during a single classroom visit distracts supervisors from being thoughtful, perceptive, open-minded observers of instruction. As Yogi Berra famously said, “You can observe a lot by watching.” In my years as a principal and a coach, I’ve made thousands of brief and full-length classroom visits and found that the only way to assess what’s going on is to keep my head up, listen carefully to teacher/student interactions, scan what’s on the walls, look over students’ shoulders to assess the instructional task, and quietly ask one or two students, “What are you working on?” Trying to do all this and think about a detailed rubric is asking way too much of supervisors and inevitably degrades the quality of teacher feedback. Better to jot a few quick notes, decide what’s most important, and talk to the teacher afterward to learn more about the context.

Second, teachers getting rubric scores after a classroom visit creates a dynamic that is top-down, evaluative, and bureaucratic. Even if teachers self-assess and “co-construct” their ratings with the principal, as Danielson recommends, the interaction is skewed toward judgment and away from coaching. The fact that the teacher must sign a form and knows that the evaluation is probably going to be uploaded into the city’s Advance system adds to the stress. Of course, persistently unsatisfactory teachers should be told, via rubric scores, that their performance is unacceptable and needs to improve immediately. But for teachers at Level 3 and 4, rubric scoring after each observation is unnecessary, and for teachers at Level 2, coaching and support should focus on one or two specific ways to improve performance.

Third, most of the principals I’m working with have been led to believe that during post-observation conversations, supervisors aren’t allowed to take into account what teachers say happened before or after a visit. This legalistic mindset (You can’t evaluate what you didn’t witness) introduces an element of distrust and undermines the quality of post-observation conversations, especially after short, unannounced visits.

Fourth, New York isn’t requiring face-to-face conversations after brief observations. This means that in super-busy schools, all too many supervisors and teachers will communicate electronically and won’t sit down and talk – which is where there’s the greatest potential for instructional improvement.

Fifth, principals and other supervisors are being asked to provide written “evidence” on each observation, and the sample evaluations distributed earlier this school year gave the clear impression that lots of evidence is expected. I’ve seen principals spending an hour or more rating and writing specifics after a classroom visit, and some teachers are going overboard compiling their own binders, even though artifacts aren’t required and count for only 5 percent of the overall score. All this paperwork makes each observation more time-consuming and cuts down on the frequency of visits and conversations.

Sixth, New York isn’t requiring enough classroom visits to get an accurate sampling of each teacher’s performance. In my experience, it takes at least 10 short, unannounced visits – about one a month – to get a good sense of what a teacher is doing on an everyday basis. With fewer visits, supervisors’ feedback isn’t as authentic or helpful.

Seventh, teachers getting feedback on 10-15 areas of the rubric can be overwhelming – especially if the feedback is critical. A well-established principle of athletic coaching is to focus on one or two points at a time, and this applies to classroom coaching as well. Administrators should focus on one or two leverage points and how to present them to the teacher most effectively. Flooding the teacher with 10-15 ratings and pieces of feedback is counterproductive.

Finally, rubrics like Danielson’s are not designed for visit-by-visit feedback and evaluation, and there’s no research that using them in this manner is effective. A good rubric provides a comprehensive description of a teacher’s overall performance – a way of summing up information from classroom visits, team and faculty meetings, students, parents – and the teacher’s own self-assessment – in a detailed end-of-year evaluation. Using the rubric to score each lesson – with all the paperwork that entails – means that that administrators won’t be in classrooms often enough to see daily reality, evaluations will be less accurate and fair, and the process won’t improve much of anything.

Here’s an alternative approach that’s being tried in a number of districts around New York State and elsewhere in the United States:

  • Use the rubric only at three strategic points in the year: (a) In September, all teachers do an honest self-assessment on the full rubric and agree with their supervisor on two or three improvement goals; (b) In mid-January, teachers meet with their supervisor and compare the teacher’s current self-assessment on the whole rubric with the supervisor’s current assessment and discuss any disagreements (the scores aren’t official); and (c) in June, teachers and supervisors repeat the January compare-and-discuss process, only this time the scores count and teacher and supervisor sign off.
  • The rest of the year, supervisors make frequent classroom visits (at least 10 short visits per teacher per year to get an accurate sampling of daily practice and reassure the teacher that one or two bad moments can be redeemed). Every visit (including full lessons) is followed promptly by a brief face-to-face conversation focused on one or two coaching points.
  • Supervisors follow up each visit/conversation with a brief written summary uploaded in Advance and forwarded electronically to the teacher (who can respond if necessary). An idea from a widely used teacher-evaluation software package is limiting these written summaries and responses to 1,000 characters.
  • The only exception to this pattern is with clearly unsatisfactory performance, which should be immediately flagged with rubric scores and lead to an improvement plan, intensive support, and, if performance doesn’t improve in a reasonable period of time, dismissal.

If the UFT and Department of Education agree that this is a plausible alternative that might produce the same or better results, the State Education Department could be persuaded to allow individual schools to hold School Based Option (SBO) votes to use this approach during the 2014-15 school year. A faculty vote would be a clear statement that teachers trust their principal to be in their classrooms more frequently, give feedback in a different mode, listen to their input, and evaluate them fairly in June.

There’s a lot of terrific teaching in New York City, and it needs to be acknowledged and praised. But there’s also a fair amount of less-than-stellar teaching, and improving it really matters for the kids – and for the teachers down the hall. I believe the approach described here would relieve supervisors of unproductive paperwork, get them into classrooms much more frequently, stimulate frequent, authentic conversations about instruction, and bring about major improvements in teaching and learning.