Detection of violence at an earlier phase is crucial to intercepting potential criminal activities such as murders, rapes, and snatching. It is a critical aspect of public safety and security, involving the identification of aggressive behaviours in numerous settings. In this research, the authors are focused on exploring the efficacy of multiple Convolutional Neural Network (CNN) architectures to detect potential violent activities, especially in developing countries such as Bangladesh. The models are trained and validated with 2834 images, whereas real-life video footages are also utilized for testing purposes. To evaluate the performance of the VGG16, VGG19, and MobileNetV2 architectures, the Intersection over Union (IOU) result is observed. In contrast, the mean Average Precision (mAP) is understood to evaluate the YOLOv8 and YOLO-NAS models. It is found that YOLOv8 exhibits better performance than other architectures in the provided dataset at the training phase. The validation loss is also found to be lower in the case of the YOLOv8 model. The outcomes of this study have significant implications for enhancing security measures, aiding law enforcement, and contributing to the development of more sophisticated surveillance systems. Execution of the models, as mentioned earlier, will lead to faster and more precise identification of violent activities, thereby promoting public safety and facilitating timely interventions.