Differentially Private Heavy Hitter Detection using Federated Analytics
- Resource Type
- Conference
- Authors
- Chadha, Karan; Chen, Junye; Duchi, John; Feldman, Vitaly; Hashemi, Hanieh; Javidbakht, Omid; McMillan, Audra; Talwar, Kunal
- Source
- 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) SATML Secure and Trustworthy Machine Learning (SaTML), 2024 IEEE Conference on. :512-533 Apr, 2024
- Subject
- Communication, Networking and Broadcast Technologies
Computing and Processing
Differential privacy
Privacy
Social networking (online)
Heuristic algorithms
Machine learning
Data models
Blocklists
Federated Analytics
Differential Privacy
Frequency Estimation
Heavy Hitter Identification
- Language
In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users’ data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset [Caldas et al., 2018] on the task of learning the most frequent words.