A novel local search method for microaggregation

Document Type: ORIGINAL RESEARCH PAPER

Authors

Abstract

In this paper, we propose an effective microaggregation algorithm to produce a more useful protected data for publishing. Microaggregation is mapped to a clustering problem with known minimum and maximum group size constraints. In this scheme, the goal is to cluster n records into groups of at least k and at most 2k_1 records, such that the sum of the within-group squared error (SSE) is minimized. We propose a local search algorithm which iteratively satisfies the constraints of the optimal solution of the problem. The algorithm solves the problem in O (n2) time. Experimental results on real and synthetic data sets with different distributions demonstrate the effectiveness of the method in producing useful protected data sets.

Keywords


[1] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5): 557-570, 2002. ISSN 0218-4885.

[2] J. Domingo-Ferrer, A. Solanas, and A. Martinez- Balleste. Privacy in statistical databases: k-anonymity through microaggregation. In Proceedings of International Conference on Granular Computing, pages 774-777. IEEE, 2006.

[3] Josep Domingo-Ferrer, Francesc Sebé, and Agusti Solanas. A polynomial-time approximation to optimal multivariate microaggregation. Computers & Mathematics with Applications, 55(4):714-732, 2008.

[4] J. Domingo-Ferrer and J.M. Mateo-Sanz. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189-201, 2002. ISSN 1041-4347.

[5] S.L. Hansen and S. Mukherjee. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering, 15(4):1043-1044, 2003. ISSN 1041-4347.

[6] A. Oganian and J. Domingo-Ferrer. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe, 18(4):345-354, 2001. ISSN 0167-8000.

[7] Josep Domingo-Ferrer and Vicenç Torra. Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery, 11(2):195-212, 2005. ISSN 1384-5810.

[8] Josep Domingo-Ferrer, Antoni Martínez-Ballesté, Josep Maria Mateo-Sanz, and Francesc Sebé. Efficient multivariate data-oriented microaggregation. The VLDB Journal, 15(4):355-369, 2006. ISSN 1066-8888.

[9] Chin-Chen Chang, Yu-Chiang Li, and Wen-Hung Huang. TFRP: An efficient microaggregation algorithm for statistical disclosure control. Journal of Systems and Software, 80(11):1866-1878, 2007. ISSN 0164-1212.

[10] Costas Panagiotakis and Georgios Tziritas. Successive group selection for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 25(5):1191-1195, 2013. ISSN 1041-4347.

[11] M. Laszlo and S. Mukherjee. Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17(7):902-911, 2005. ISSN 1041-4347.

[12] Marc Solé, Victor Muntés-Mulero, and Jordi Nin. Efficient microaggregation techniques for large numerical data volumes. International Journal of Information Security, 11(4):253-267, 2012. ISSN 1615-5262.

[13] Jun-Lin Lin, Tsung-Hsien Wen, Jui-Chien Hsieh, and Pei-Chann Chang. Density-based microaggregation for statistical disclosure control. Expert Systems with Applications, 37(4):3256-3263, 2010. ISSN 0957-4174.

[14] Michael Laszlo and Sumitra Mukherjee. Iterated local search for microaggregation. Journal of Systems and Software, 100:15-26, 2015.

[15] Reza Mortazavi and Saeed Jalili. Fast data-oriented microaggregation algorithm for large numerical datasets. Knowledge-Based Systems, 67: 195-205, 2014.

[16] Sergio Martínez, David Sánchez, and Aida Valls. Semantic adaptive microaggregation of categorical microdata. computers & security, 31(5):653-672, 2012.

[17] Jun-Lin Lin, Pei-Chann Chang, Julie Yu-Chih Liu, and Tsung-Hsien Wen. Comparison of microaggregation approaches on anonymized data quality. Expert Systems with Applications, 37(12): 8161-8165, 2010.