Choosing desired videos from an extensive collection based on features extracted from the videos is known as content based video retrieval (CBVR). The extracted features are then indexed, categorized, and used to retrieve relevant and desired videos while excluding irrelevant ones. Videos can be represented by the audio, texts, faces, and objects in their frames. Some aspects could be improved regarding retrieval systems. One of these aspects is the need for an Arabic retrieval system. This study presents an advanced Arabic multimodal video retrieval system. The retrieval system uses the Apache Solr search engine to retrieve the videos using a text query. State-of-the-art techniques are used to retrieve videos using images and audio. The system presents a Mean Average Precision at 1 (MAP@1) score of 0.9333 and a MAP@3 0.8129 in the context of audio-based retrieval. Similarly, for image-based retrieval, the scores are 0.9800 and 0.8666 for the MAP@1 and MAP@3, respectively. Additionally, a novel hybrid recommendation system is presented, which combines content-based and collaborative filtering. The proposed model exhibits robust performance on several key metrics. With a MAP@10 of 0.3246 and a Normalized Discounted Cumulative Gain at 10 (NDCG@10) score of 0.4939. The model's Item Coverage score of 0.6905 for the top 100 recommendations, showcases its capacity to offer a broad range of recommendations. The hybrid system also solves the popularity bias and the ‘new item’ startup problem in collaborative filtering. Popularity-based recommendations are used as a fallback to handle the ‘new user’ startup problem in both collaborative filtering and content-based systems.