In real-world applications, speech recognition is becoming increasingly popular. Human emotion and automatic gender recognition, which aims to identify male and female voices from any available emotional speech database, is an exciting application. It is noticeable that the performance of the automatic speech emotion and gender recognition system diminishes when cross-corpus circumstances exist, such as when multiple languages are present or a previously unknown language is present, such as Urdu. This study focuses on automatic emotion detection and gender identification from publicly available emotional speech databases. For this work, two public western language databases, namely, RAVDESS (English) and EmoDB (German), are combined for training, and the Urdu database is used for test purposes. The research reported that the k-fold ensemble soft-voting model, deep learning model, and augmented deep learning model obtained 79%, 82%, and 97.6% accuracy, respectively. The results are considerably better than those of many existing systems. The performance evaluation results are also encouraging. Many previous studies on speech emotion recognition have focused on various languages. The proposed technique is sufficiently robust and can efficiently detect emotion and identify gender from the Urdu database. The approach can be used in a wide range of applications.