The consistency between raters over 3 years of a high-stakes performance assessment was examined in 2 studies that involved students in Grades 3, 5, and 8. The students' performance was evaluated in reading, writing, language usage, mathematics, science, and social studies. The results showed that the groups of raters used in different years differed in severity. Their consistency tended to improve over years, but differences between the rater groups remained. It is shown that these differences could affect students' proficiency classifications, indicating the need to adjust for rater effects during the equating process. The Grade 8 raters generally were found to be more consistent than the Grade 3 and Grade 5 raters. Also, the raters in mathematics generally were the most consistent, those in the language arts areas were the least consistent, and the consistency of raters in science and social studies varied over grade levels. [ABSTRACT FROM AUTHOR]