Automatic Speech Recognition (ASR) has grown enormously over the past ten years, attracting much interest and attention. Implementing their systems, language adaption, and performance robustness remain some of the major obstacles. Sanskrit presents a challenge for developing such systems since it is a more complex language than other languages and lacks common databases. Deep learning is widely applied in numerous study domains and has established greater importance. The capability of a machine or a program to recognize spoken statements or to translate speech is known as automated speech recognition. It requires the ability to contrast a vocal pattern with a pre-existing or previously learned set of words. This study aims to develop an effectively optimized recurrent neural network (RNN) and CNN-based Sanskrit speech recognition system. Additionally, the Connectionist Temporal Classification (CTC) loss function is employed to increase the likelihood of accurate transcription. Using 46,000 utterances from 27 distinct speakers, the algorithm research has been trained to recognize Sanskrit. The experimental results show potential for automated processing of valuable information extraction. Accurately processing language accents is essential for successful human interaction, and this plays a role in advancing more streamlined and efficient approaches to accomplish this objective.