[1] Adel Alshamrani, Sowmya Myneni, Ankur Chowdhary, and Dijiang Huang. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Communications Surveys & Tutorials, 21(2):1851–1877, 2019.
[2] James P Anderson. Computer security threat monitoring and surveillance. Technical Report, James P. Anderson Company, 1980.
[3] Ankit Thakkar and Ritika Lohiya. A review of the advancement in intrusion detection datasets. Procedia Computer Science, 167:636–645, 2020.
[4] Sowmya Myneni, Ankur Chowdhary, Abdulhakim Sabur, Sailik Sengupta, Garima Agrawal, Dijiang Huang, and Myong Kang. Dapt 2020-constructing a benchmark dataset for advanced persistent threats. In Deployable Machine Learning for Security Defense: First International Workshop, MLHat 2020, San Diego, CA, USA, August 24, 2020, Proceedings 1, pages 138–163. Springer, 2020.
[5] Jinxin Liu, Yu Shen, Murat Simsek, Burak Kantarci, Hussein T Mouftah, Mehran Bagheri, and Petar Djukic. A new realistic benchmark for advanced persistent threats in network traffic. IEEE Networking Letters, 4(3):162–166, 2022.
[6] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
[7] Zhaoqing Pan, Weijie Yu, Xiaokai Yi, Asifullah Khan, Feng Yuan, and Yuhui Zheng. Recent progress on generative adversarial networks(gans): A survey. IEEE access, 7:36322–36333, 2019.
[8] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan. Advances in neural information processing systems, 32, 2019.
[9] CopulaGAN model.
https://sdv.dev/SDV/user_guides/single_table/copulagan.html. Accessed: 25 January 2025.
[10] SDV Dev Team. Sdv documentation. https://docs.sdv.dev/sdv/. Accessed: 25 January 2025.
[11] Li Yang, Abdallah Moubayed, Ismail Hamieh, and Abdallah Shami. Tree-based intelligent intrusion detection system in internet of vehicles. In 2019 IEEE global communications conference(GLOBECOM), pages 1–6. IEEE, 2019.
[12] Sdmetrics documentation. https://docs.sdv.dev/sdmetrics/. Accessed: 25 January 2025.
[13] Unb-cs-ids dataset (2018). https://www.unb.ca/cic/datasets/ids-2018.html. Accessed: 25 January 2025.
[14] Unb-cs-ids dataset (2018). https://www.unb.ca/cic/datasets/ids-2017.html. Accessed: 25 January 2025.
[15] Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
[16] Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security, 31(3): 357–374, 2012.
[17] Unb-cs-ids dataset (2018). https://www.unb.ca/cic/datasets/ids.html. Accessed: 25 January 2025.
[18] Richard P Lippmann, David J Fried, Isaac Graf, Joshua W Haines, Kristopher R Kendall, David McClung, Dan Weber, Seth E Webster, Dan Wyschogrod, Robert K Cunningham, et al. Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation. In Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00, vol-
ume 2, pages 12–26. IEEE, 2000.
[19] Benjamin Sangster, TJ O’connor, Thomas Cook, Robert Fanelli, Erik Dean, Christopher Morrell, and Gregory J Conti. Toward instrumenting network warfare competitions to generate labeled datasets. In CSET, 2009.
[20] Yusuke Takahashi, Shigeyoshi Shima, Rui Tanabe, and Katsunari Yoshioka. {APTGen}: An approach towards generating practical dataset labelled with targeted attack sequences. 2020.
[21] Stavroula Bourou, Andreas El Saer, Terpsichori Helen Velivassaki, Artemis Voulkidis, and Theodore Zahariadis. A review of tabular data synthesis using gans on an ids dataset. Information, 12(09):375, 2021.
[22] Jiayu Wang, Xuehu Yan, Lintao Liu, Longlong Li, and Yongqiang Yu. Cttgan: Traffic data synthesizing scheme based on conditional gan. Sensors, 22(14):5243, 2022.
[23] Ayesha Siddiqua Dina, AB Siddique, and D Manivannan. Effect of balancing data using synthetic data on the performance of machine learning classifiers for intrusion detection in computer networks. IEEE Access, 10:96731–96747, 2022.
[24] Drake Cullen, James Halladay, Nathan Briner, Ram Basnet, Jeremy Bergen, and Tenzin Doleck. Evaluation of synthetic data generation techniques in the domain of anonymous traffic classification. IEEE Access, 10:129612–129625, 2022.
[25] Cumulative distribution function. URL https://en.wikipedia.org/wiki/Cumulative_distribution_function. Accessed: September 14, 2024.
[26] Kolmogorov–smirnov test. URL
https://en.wikipedia.org/wiki/Kolmogorov%E2% 80%93Smirnov_test. Accessed: 25 January 2025.
[27] Pearson correlation coefficient. URL https://en.wikipedia.org/wiki/Pearson_correlation_coefficient. Accessed: 25 January 2025.
[28] scipy.stats.pearsonr documentation. URL
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html. Accessed: 25 January 2025.
[29] Spearman’s rank correlation coefficient. URL https://en.wikipedia.org/wiki/Spearman’s_rank_correlation_coefficient. Accessed: 25 January 2025.
[30] scipy.stats.spearmanr documentation. URL
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats. spearmanr.html. Accessed: 25 January 2025.
[31] Contingency table. URL
https://en.wikipedia.org/wiki/Contingency_table. Accessed: 25 January 2025.
[32] Total variation distance of probability measures. URL
https://en.wikipedia. org/wiki/Total_variation_distance_of_ probability_measures. Accessed: 25 January 2025.
[33] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.