Articulation training with many kinds of stimulus and messages such as visual, voice, and articulatory information can teach user to pronounce correctly and improve user’s articulatory ability. In this paper, an articulation training system with intelligent interface and multimode feedbacks is proposed to improve the performance of articulation training. Clinical knowledge of speech evaluation is used to design the dependent network. Then, automatic speech recognition with dependent network is applied to identify the pronunciation errors. Besides, hierarchical Bayesian network is proposed to recognize user’s emotion from speeches. With the information of pronunciation errors and user’s emotional state, the articulation training sentences can be dynamically selected. Finally, a 3D facial animation is provided to teach users to pronounce a sentence by using speech, lip motion, and tongue motion. Experimental results reveal the usefulness of proposed method and system.