eArticles

Home

eArticles

검색결과 돌아가기

검색화면

Export 프린트

Deduplicating Large Volumes of Data from Natural and Legal Entities in the Governmental Field

Resource Type: Conference
Authors: Carvalho, Marcos; Mangaravite, Vitor; Ponce, Lucas M.; Cantelli, Luis; Campoi, Bruno; Nunes, Gabriel; Miranda de Paiva, Bruno Barbosa; Laender, Alberto H. F.; Goncalves, Marcos Andre
Source: 2022 IEEE International Conference on Big Data (Big Data) Big Data (Big Data), 2022 IEEE International Conference on. :2206-2213 Dec, 2022
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Geoscience
Robotics and Control Systems
Signal Processing and Analysis
Law
Government
Buildings
Data integration
Big Data
Task analysis
Record Deduplication
Dedupe
Data management
Data Systems
Natural and Legal Entities
Language

Online Access

Full Text (IEEE)

초록

Record Deduplication (RD) aims to identify instances that represent the same real-world entity in data repositories. In the government environment, the RD process facilitates the identification of irregularities and reduces the consumption of computing resources in data integration tasks. In this context, we propose a scalable, effective and efficient platform, called DedupeGov, for integrating large data repositories (i.e., with large volumes of data, in the order of millions of records) to unify duplicate entities from multiple and different sources. Our experimental results indicate a 21.8% of reduction in the number of records of the original repository with 99% of precision and 95% of recall when identifying duplicate records. In addition, our platform was capable of building more complete records, eliminating at least 32% of records with null attributes. Furthermore, our solution is very efficient and scalable for large volumes of data, deduplicating a repository of almost 400 million records in around one hour, besides being easy to generalize to different types of entity.

공지

DAU Library

eArticles

요약정보

Deduplicating Large Volumes of Data from Natural and Legal Entities in the Governmental Field

Online Access

초록