A distributed incremental information acquisition model for large-scale text data.
- Resource Type
- Article
- Authors
- Sun, Shengtao; Gong, Jibing; Zomaya, Albert Y.; Wu, Aizhi
- Source
- Cluster Computing. Jan2019 Supplement 1, Vol. 22 Issue 1, p2383-2394. 12p.
- Subject
- *INFORMATION modeling
*BIG data
*EMPLOYEE reviews
*DATA
- Language
- ISSN
- 1386-7857
Timely discovering and acquiring information from incremental data on the Internet is a hot topic in a big data era. This paper presents a distributed incremental information acquisition model for large-scale text data. To obtain a lower false positive rate and higher efficiency of the traditional Bloom filter, a distributed multidimensional Bloom filter is designed and proposed to cope with the deduplication of large-scale Web URL text data. Three methods related to Bloom filter were compared based on the false positive rate and response efficiency. The results show that the distributed incremental information acquisition model for large-scale text data can achieve a high duplicate removal rate with a lower false positive rate. [ABSTRACT FROM AUTHOR]