The current investigation is a study which aims to examine the reliability of automated essay scoring(AES) and investigate the validity of writing (sub)constructs that Criterion would/would not measure. Criterion evaluated test-takers’ essays written for assessed iBT TOEFL independent writing tasks by comparing human raters’ evaluation. In particular, the current study explored which essay features were most closely related to each of the six different analytic dimensions of Criterion. Five types of prompts were employed to create a writing test administered to fifty college students in Seoul. The result showed that the agreement between human-rater and Criterion was moderate. In addition, three essay features(development, organization, and grammar/usage) were crucial factors to predict the holistic score in human rating while organization also was the most powerful predictor of AES overall scores. Also, mechanics and sentence variety/constructions were the second and the third strong factors to predict writing scores in AES. This factor discrepancy in measuring writing scores might reflect that a few sentence constructions were differently evaluated between Criterion and human raters. This result suggests that the feature dimensions in Criterion need refinement in its construct dimensions. The findings have some implications in teaching process writing to students and in using AES.