Face detection is an important task in the field of computer vision, which is widely used in the field of security, human-machine interaction, identity recognition, and etc. Many existing methods are developed for image based face pose estimation, but few of them can be directly extended to videos. However, video-based face pose estimation is much more important and frequently used in real applications. This paper describes a method of automatic face pose estimation from videos based on mixture-of-trees model and optical flow. Unlike the traditional mixture-of-trees model, which may easily incur errors in losing faces or with wrong angles for a sequence of faces in video, our method is much more robust by considering the spatio-temporal consistency on the face pose estimation for video. To preserve the spatio-temporal consistency from one frame to the next, this method employs an optical flow on the video to guide the face pose estimation based on mixture-of-trees. Our method is extensively evaluated on videos including different faces and with different pose angles. Both visual and statistics results demonstrated its effectiveness on automatic face pose estimation.