The purpose of this study was to test methods that strengthen the comparability claims about annual determinations of student proficiency in English language arts, math, and science (Grades 3-12) in the New Hampshire Performance Assessment of Competency Education (NH PACE) pilot project. First, we examined the literature in order to define comparability outside the bounds of strict score interchangeability and explored methods for estimating comparability that support a balanced assessment system for state accountability such as the NH PACE pilot. Second, we applied two strategies-consensus scoring and a rank-ordering method-to estimate comparability in Year 1 of the NH PACE pilot based upon the expert judgment of 85 teachers using 396 student work samples. We found the methods were effective for providing evidence of comparability and also detecting when threats to comparability were present. The evidence did not indicate meaningful differences in district average scoring and therefore did not support adjustments to district-level cut scores used to create annual determinations. The article concludes with a discussion of the technical challenges and opportunities associated with innovative, balanced assessment systems in an accountability context. [ABSTRACT FROM AUTHOR]