Document Type : Research Article

Author

School of Engineering, Damghan University, Damghan, Iran

Abstract

Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational privacy models such as $k$-anonymity in which data is clustered into groups of at least $k$ members. Microaggregation is a mechanism to realize $k$-anonymity. The objective is to assign records of a dataset to clusters and replace the original values with their associated cluster centers which are the average of assigned values to minimize information loss in terms of the sum of within group squared errors ($SSE$). While the problem is shown to be NP-hard in general, there is an optimal polynomial-time algorithm for univariate datasets. This paper shows that the assignment of the univariate microaggregation algorithm cannot produce optimal partitions for integer observations where the computed centroids have to be integer values. In other words, the integrality constraint on published quantities has to be addressed within the algorithm steps and the optimal partition cannot be attained using only the results of the general solution. Then, an effective method that considers the constraint is proposed and analyzed which can handle very large numerical volumes. Experimental evaluations confirm that the developed algorithm not only produces more useful datasets but also is more efficient in comparison with the general optimal univariate algorithm.

Keywords

[1] Leon CRJ Willenborg and Ton De Waal. Elements of statistical disclosure control, volume 155. Springer Verlag, 2001.
[2] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557–570, 2002.
[3] J. Domingo-Ferrer, A. Solanas, and A. MartinezBalleste. Privacy in statistical databases: kanonymity through microaggregation. In Proceedings of International Conference on Granular Computing, pages 774–777. IEEE, 2006.
[4] William E Winkler. Re-identification methods for masked microdata. In Privacy in statistical databases, pages 216–230. Springer, 2004.
[5] Josep Domingo-Ferrer and Vicenç Torra. Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11(2):195–212, 2005.
[6] Reza Mortazavi and Saeed Jalili. Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization. Data Mining and Knowledge Discovery,30(3):605–639, 2016.
[7] François Rousseau, Jordi Casas-Roma, and Michalis Vazirgiannis. Community-preserving anonymization of graphs. Knowledge and Information Systems, pages 1–29, 2017.
[8] Reza Mortazavi and Seyyedeh Hamideh Erfani. An effective method for utility preserving social network graph anonymization based on mathematical modeling. International Journal of Engineering, 31(10):1624–1632, 2018.
[9] Jordi Soria-Comas and Josep Domingo-Ferrer. Differentiallyprivatedatapublishingviaoptimal univariate microaggregation and record perturbation. Knowledge-Based Systems, 153:78 – 90, 2018.
[10] A. Oganian and J. Domingo-Ferrer. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Commission for Europe, 18(4):345–354, 2001.
[11] S.L. Hansen and S. Mukherjee. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering, 15(4):1043–1044, 2003.
[12] Kun Liu and Evimaria Terzi. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 93–106. ACM, 2008.
[13] Jordi Casas-Roma, Jordi Herrera-Joancomartí, and Vicenç Torra. k-Degree anonymity and edge selection: improving data utility in large networks. Knowledge and Information Systems, 50(2):447–474, 2017.
[14] Reza Mortazavi, Saeed Jalili, and Hojjat Gohargazi. Multivariate microaggregation by iterativeoptimization.AppliedIntelligence,39(3):529– 544, 2013.
[15] J. Domingo-Ferrer and J.M. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosurecontrol. IEEE Transactions on Knowedge and Data Engineering, 14(1):189–201, 2002.
[16] JosepDomingo-Ferrer,AntoniMartínez-Ballesté, Josep Maria Mateo-Sanz, and Francesc Sebé. Efficient multivariate data-oriented microaggregation. The VLDB Journal, 15(4):355–369, 2006.
[17] JordiNin,JavierHerranz,andVicençTorra. On the disclosure risk of multivariate microaggregation. Data & Knowledge Engineering, 67(3):399 – 412, 2008.
[18] David Rebollo Monedero, Ahmad Mohamad Mezher, Xavier Casanova Colomé, Jordi Forné, and Miguel Soriano. Efficient k-anonymous microaggregation of multivariate numerical data via principal component analysis. Information Sciences, 503:417 – 443, 2019.
[19] Reza Mortazavi and Saeed Jalili. Fast dataoriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67:195 – 205, 2014.
[20] Cynthia Dwork. Differential Privacy, pages 338– 340. Springer US, Boston, MA, 2011.
[21] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009.
[22] Reza Mortazavi and Saeed Jalili. A novel local search method for microaggregation. ISeCure, 7(1):15 – 26, 2015.
[23] B. Heaton. New Record Ordering Heuristics for Multivariate Microaggregation. PhD thesis, Nova Southeastern University, 2012.