Food intake monitoring plays an important role in personal dietary systems. Existing video based eating activity monitoring systems typically use recordings taken with an identical device in a single laboratory setting. In contrast, we explore videos recorded using smartphones for recognizing eating gestures in home environments. For this purpose, we collected 20 eating sessions from 14 participants using different smartphones. Specifically, the data is labelled into eating and no-eating classes. To recognize eating activity from video we have employed three deep learning approaches namely, 3D CNN, SlowFast network, and CNN-LSTM. Our approach has achieved the best F1-score of 0.560 with SlowFast network when evaluated using the Leave-One-Subject-Out (LOSO) scheme. Our preliminary results suggest that the video-based food intake monitoring can be used in home environments. However, our models failed to recognize the eating activity when the user tends to bend to pick food from the plate. More videos with such eating styles need to be incorporated in training data to enhance the performance.