This paper proposes a tracking approach for regions of interest (ROI) in thermal image videos, where vital signs can be measured for emotion recognition. The proposed tracking framework overcomes a number of problems associated with this goal; mainly size of the ROI, appearance variations in the ROI with physiological changes, and the duration of tracking in a practical setting. The proposed framework consists of three modules: An adaptive particle filter tracker, an online detector, and finally a module to integrate the outputs of the two previous modules for learning as well as the final decision. The template of the adaptive particle filter tracker is updated based on the learning decision module to avoid drifting. In the detector module, a randomized classifier is used to detect the ROI. Then the output of this classifier is enhanced by removing false positives using a proposed geometrical constraint. The proposed framework is tested and compared to the state of art approaches on 32 human subjects with different physiological changes. Experimental results show that proposed method outperforms the others.