Record Deduplication (RD) aims to identify instances that represent the same real-world entity in data repositories. In the government environment, the RD process facilitates the identification of irregularities and reduces the consumption of computing resources in data integration tasks. In this context, we propose a scalable, effective and efficient platform, called DedupeGov, for integrating large data repositories (i.e., with large volumes of data, in the order of millions of records) to unify duplicate entities from multiple and different sources. Our experimental results indicate a 21.8% of reduction in the number of records of the original repository with 99% of precision and 95% of recall when identifying duplicate records. In addition, our platform was capable of building more complete records, eliminating at least 32% of records with null attributes. Furthermore, our solution is very efficient and scalable for large volumes of data, deduplicating a repository of almost 400 million records in around one hour, besides being easy to generalize to different types of entity.