Most speech recognition systems utilize cloud computing for model training and updates. Speech data, being personally identifiable information (PII), encompasses personal, privacy-sensitive, and regulated content. Relying on centralized servers or third parties can threaten confidential data, resulting in privacy breaches. Therefore, privacy issues and strict regulations (e.g., EU’s general data protection regulation, California’s CCPA, and the Privacy Act in Australia) limit the availability of large data sets. The scarcity of data sets is particularly pronounced in less-represented languages, like Persian, adversely impacting innovation and data-driven product development. To overcome the challenges posed by the scarcity of data sets and privacy concerns, for the first time, we propose a novel federated learning (FL) solution for Persian Spoken Isolated Digit Recognition. This proposed technique bridges the gap between privacy and utility by enabling the training of an algorithm using decentralized data sets stored on edge devices or servers, without the need for data exchange. Nonindependent and identically distributed data (non-IID), such as unique speaker accents, poses a challenge in speech recognition, especially in an FL setup. Regrettably, this challenge has largely been overlooked in existing techniques and methodologies. To address this, we present an innovative personalized clustered FL (PCFL) approach that successfully exploits similarities among the private data distributions and captures distinctive characteristics inherent in each client’s data in order to train models. The experimental results show that while the proposed solution significantly addresses privacy concerns, it has a negligible performance loss compared to centralized model training techniques.