Over-the-Air Federated Adaptive Data Analysis: Preserving Accuracy via Opportunistic Differential Privacy

Document Type : Research Article

Authors

Information Systems and Security Lab. (ISSL), Department of Electrical Engineering, Sharif University of Tech., Tehran, Iran

Abstract
Adaptive data analysis (ADA) involves a dynamic interaction between an analyst and a dataset owner, where the analyst submits queries sequentially, adapting them based on previous answers. This process can become adversarial, as the analyst may attempt to overfit by targeting non-generalizable patterns in the data. To counteract this, the dataset owner introduces randomization techniques, such as adding noise to the responses. This noise not only helps prevent overfitting, but also enhances data privacy. However, it must be carefully calibrated to ensure that the statistical reliability of the responses is not compromised. In this paper, we extend the ADA problem to the context of distributed datasets. Specifically, we consider a scenario where a potentially adversarial analyst interacts with multiple distributed responders through adaptive queries. We assume the responses are subject to noise, introduced by the channel connecting the responders and the analyst. We demonstrate how this noise can be opportunistically leveraged through a federated mechanism to enhance the generalizability of ADA, thereby increasing the number of query-response interactions between the analyst and the responders. We illustrate that the careful tuning of the transmission amplitude based on the theoretically achievable bounds can significantly impact the number of accurately answerable queries.

Keywords


1] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117–126. ACM, 2015.
[2] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. In Advances in Neural Information Processing Systems, pages 2350–2358, 2015.
[3] Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Dipakbhai Thakkar, and Blake Woodworth. Guaranteed validity for empirical approaches to adaptive data analysis. In International Conference on Artificial Intelligence and Statistics, pages 2830–2840, 2020.
[4] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
[5] Howard H Yang, Zihan Chen, Tony QS Quek, and H Vincent Poor. Revisiting analog over-the-air machine learning: The blessing and curse of interference. IEEE Journal of Selected Topics in Signal Processing, 2021.
[6] Mohammad Mohammadi Amiri and Deniz G¨und¨uz. Federated learning over wireless fading channels. IEEE Transactions on Wireless Communications, 19(5):3546–3557, 2020.
[7] Mohamed Seif Eldin Mohamed, Wei-Ting Chang, and Ravi Tandon. Privacy amplification for federated learning via user sampling and wireless aggregation. IEEE Journal on Selected Areas in Communications, 39(12):3821–3835, 2021.
[8] Amir Sonee, Stefano Rini, and Yu-Chih Huang. Wireless federated learning with limited communication and differential privacy. arXiv preprint arXiv:2106.00564, 2021.
[9] Mohamed Seif, Ravi Tandon, and Ming Li. Wireless federated learning with local differential privacy. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 2604–2609. IEEE, 2020.
[10] Cao Xiaowen, Zhu Guangxu, Xu Jie, Wang Zhiqin, and Cui Shuguang. Optimized power control design for over-the-air federated edge learning. arXiv preprint arXiv:2106.09316, 2021.
[11] Yusuke Koda, Koji Yamamoto, Takayuki Nishio, and Masahiro Morikura. Differentially private aircomp federated learning with power adaptation harnessing receiver noise. In GLOBECOM 2020-2020 IEEE Global Communications Conference, pages 1–6. IEEE, 2020.
[12] Yusuke Koda, Jihong Park, Mehdi Bennis, Praneeth Vepakomma, and Ramesh Raskar. Airmixml: Over-the-air data mixup for inherently privacy-preserving edge machine learning. arXiv preprint arXiv:2105.00395, 2021.
[13] Dongzhu Liu and Osvaldo Simeone. Privacy for free: Wireless federated learning via uncoded transmission with adaptive power control. IEEE Journal on Selected Areas in Communications, 39(1):170–185, 2020.
[14] Zezhong Zhang, Guangxu Zhu, Rui Wang, Vincent KN Lau, and Kaibin Huang. Turning channel noise into an accelerator for over-the-air principal component analysis. arXiv preprint arXiv:2104.10095, 2021.
[15] Guangxu Zhu, Yuqing Du, Deniz G¨und¨uz, and Kaibin Huang. One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis. IEEE Transactions on Wireless Communications, 20(3):2120–2135, 2021.
[16] Yulin Shao, Deniz G¨und¨uz, and Soung Chang Liew. Federated edge learning with misaligned over-the-air computation. IEEE Transactions on Wireless Communications, 2021.
[17] Mohammad Mohammadi Amiri, Tolga M Duman, Deniz Gunduz, Sanjeev R Kulkarni, and H Vincent Poor. Blind federated edge learning. arXiv preprint arXiv:2010.10030, 2020.
[18] Mohammad Mohammadi Amiri and Deniz G¨und¨uz. Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air. IEEE Transactions on Signal Processing, 68:2155–2169, 2020.
[19] Chuan-Zheng Lee, Leighton Pate Barnes, and Ayfer ¨Ozg¨ur. Over-the-air statistical estimation. IEEE Journal on Selected Areas in Communications, 2021.
[20] Yuval Dagan and Gil Kur. A bounded-noise mechanism for differential privacy. In Conference on Learning Theory, pages 625–661. PMLR, 2022.