Rapid technological advances have significantly improved our ability to analyse video data. This comprehensive review examines machine learning (ML) models applied to video event detection and classification, including CNNs, deep neural networks (DNNs), and RNNs. When evaluated on benchmark datasets for accuracy, these approaches demonstrate their relative strengths and weaknesses. Researchers have encountered numerous challenges in video event detection, which are addressed throughout the review. However, achieving high detection precision remains challenging due to diverse event types, video quality issues, model over fitting risks, and lack of large labeled training datasets. Background scenes, lighting, and object occlusion further complicate accurate identification. As datasets and computational power grow, video event detection stands to gain significantly. This review assessed action recognition models trained on the UCF-101 and CCV databases. On CCV, a 2-stage neural network achieved 75% accuracy; while a multi-stream deep learning (DL) system obtained 77.5%. For the larger UCF101, 2-stream and RNN architectures realized 92% and 89% accuracy using video-level prediction.