This paper applies the structural representation of the pronunciation for computer-aided language learning (CALL). This representation was proposed to remove non-linguistic features such as age, gender, speaker, etc from speech acoustics (N. Minematsu et al., 2005). The removal was performed by extracting only the interrelations of speech events and discarding their absolute properties such as formants and spectrum envelopes. All the extracted interrelations mathematically form the external phonological structure of the events. Using this representation, in S. Asakawa et al., (2005), the vowel structure of a language learner was extracted and it was shown that the structural development via training can be traced and visualized adequately. This structural visualization can be regarded as pronunciation portfolio (N. Minematsu et al., 2004). This paper shows that the new representation can classify the language learners adequately and indicate which vowels should be corrected by priority.