Actually several domains `general public' are concerned by the utility of visual speech information such as e-learning, Tele-assistance, Human-Machine interaction, etc. The visual information helpfulness is more imperative among people with special needs. We can imagine for example, a dependant people ordered his machine with an easy lip movement or by a simple syllable pronunciation. More than, hard of hearing people normally compensate for their hearing problems by lip-reading as well as listening to the person they are talking to. Therefore, it's very important to create an automatic lip-reading system that resolve partially people hearing problem. We presents in this paper a new hybrid approach named ALiFE to automatically localize lip feature points in the speaker's face and to carry out a spatial-temporal tracking of these points. ALiFE prototype is evaluated with a multiple speakers under natural conditions. In order to approve our lip feature extraction approach, we have created a specific audio-visual corpus. The final results show the different configuration of the mouth through visemes. Later on these visemes will be associated to relatively precise physical measures.