Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Document Type: ORIGINAL RESEARCH PAPER

Authors

1 Department of Computer Engineering, Shahed University, Tehran, Iran

2 Network, ITRC

Abstract

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing unnecessary features is a solution to this problem. Using machine learning methods is one of the best ways to design an intrusion detection system. Focusing on this issue, in this paper, we propose a hybrid intrusion detection system using the decision tree and support vector machine (SVM) approaches. In our method, the feature selection is initially done by the C5.0 decision tree pruning, and then the features with the least predictor importance value are removed. After removing each feature, the least square support vector machine (LS-SVM) is applied. The set of features having the highest surface area under the Receiver Operating Characteristic (ROC) curve for LS-SVM are considered as final features. The experimental results on two KDD Cup 99 and UNSW-NB15 data sets show that the proposed approach improves true positive and false positive criteria and accuracy compared to the best prior work.

Keywords


[1] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15, 2009.

[2] Shraddha Khonde and V Ulagamuthalvi. A machine learning approach for intrusion detection using ensemble technique-a survey. 2018.

[3] Asish Kumar Dalai and Sanjay Kumar Jena. Hybrid network intrusion detection systems: A decades perspective. In Proceedings of the International Conference on Signal, Networks, Computing, and Systems, pages 341–349. Springer, 2017.

[4] Lin Li Zhong, Zhang Ya Ming, and Zhang Yu Bin. Network intrusion detection method by least squares support vector machine classifier. In 2010 3rd International Conference on Computer Science and Information Technology, volume 2, pages 295–297. IEEE, 2010.

[5] Pablo Bermejo, Luis de la Ossa, Jos´e A G´amez, and Jos´e M Puerta. Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowledge-Based Systems, 25(1):35–44, 2012.

[6] Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, and Kuobin Dai. An efficient in
trusion detection system based on support vector machines and gradually feature removal method. Expert Systems with Applications, 39(1):424–430, 2012.

[7] Sevcan YILMAZ G¨UND¨UZ and Muhammet Nurullah C¸ETER. Feature selection and comparison ofclassification algorithmsfor intrusion detection. Anadolu University of Sciences & Technology-A: Applied Sciences & Engineering, 19(1), 2018.

[8] Zohreh Abtahi Foroushani and Yue Li. Intrusion detection system by using hybrid algorithm of data mining technique. In Proceedings of the 2018 7th International Conference on Software andComputerApplications,pages119–123.ACM, 2018.

[9] Chibuzor John Ugochukwu, EO Bennett, and P Harcourt. An intrusion detection system using machine learning algorithm. International Journal of Computer Science and Mathematical Theory, 4(1):2545–5699, 2018.

[10] Nachiket Sainis, Durgesh Srivastava, and Rajeshwar Singh. Feature classification and outlier detection to increased accuracy in intrusion detection system. International Journal of Applied Engineering Research, 13(10):7249–7255, 2018.

[11] Chandrashekhar Azad and Vijay Kumar Jha. Decision tree and genetic algorithm based intrusion detection system. In Proceeding of the Second International Conference on Microelectronics, Computing & Communication Systems (MCCS 2017), pages 141–152. Springer, 2019.

[12] Shaohua Teng, Naiqi Wu, Haibin Zhu, Luyao Teng,andWeiZhang. Svm-dt-basedadaptiveand collaborative intrusion detection. IEEE/CAA Journal of Automatica Sinica, 5(1):108–118, 2017.

[13] Snehal A Mulay, PR Devale, and GV Garje. Intrusion detection system using support vector machine and decision tree. International Journal of Computer Applications, 3(3):40–43, 2010. [14] T Augustine, P Vasudeva Reddy, and PVGD Prasad Reddy. A frame work for performance evaluation of classifiers: Case study on nids. International Journal of Pure and Applied Mathematics, 118(20):973–984, 2018.

[15] Kathleen Goeschel. Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive bayes for off-line analysis. In SoutheastCon 2016, pages 1–6. IEEE, 2016.

[16] Ngoc Tu Pham, Ernest Foo, Suriadi Suriadi, Helen Jeffrey, and Hassan Fareed M Lahza. Improving performance of intrusion detection system using ensemble methods and feature selection. In Proceedings of the Australasian Computer Science Week Multiconference, page 2. ACM, 2018.

[17] Anuradha S Varal and SK Wagh. Misuse and anomaly detection using ensemble learning network traffic model.

[18] S Latha and Sinthu Janita Prakash. Hpfsm-a high pertinent feature selection mechanism for intrusion detection system. International Journal of Pure and Applied Mathematics, 118(9):77–83, 2018.

[19] Ping Wang, Kuo-Ming Chao, Hsiao-Chung Lin, Wen-Hui Lin, and Chi-Chun Lo. An efficient flow control approach for sdn-based network threat detection and migration using support vector machine. In 2016 IEEE 13th International Conference on e-Business Engineering (ICEBE), pages 56–63. IEEE, 2016.

[20] Phurivit Sangkatsanee, Naruemon Wattanapongsakorn, and Chalermpol Charnsripinyo. Practical real-time intrusion detection using machine learning approaches. Computer Communications, 34(18):2227–2235, 2011.

[21] Meesala Shobha Rani and S Basil Xavier. A hybrid intrusion detection system based on c5. 0 decision tree and one-class svm. International journal of current engineering and technology, 5(3):2001–2007, 2015.

[22] L Breiman, JH Friedman, RA Olshen, and CJ Stone. Classification and regression trees (monterey, ca: Wadsworth and brooks/cole). Links, 1984.

[23] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.

[24] E Serkani, H Gharaee Garakani, N Mohammadzadeh, and E Vaezpour. Hybrid anomaly detection using decision tree and support vector machine. International Journal of Electrical, Electronic and Communication Sciences, page 6, 2018.

[25] Lior Rokach and Oded Z Maimon. Data mining with decision trees: theory and applications, volume 69. World scientific, 2008.

[26] Max Kuhn and Kjell Johnson. Applied predictive modeling, volume 26. Springer, 2013.

[27] Hossein Gharaee and Maryam Fekri. A new feature selection for intrusion detection system. International Journal of Academic Research, 7, 2015.

[28] Hossein Gharaee and Hamid Hosseinvand. A new feature selection ids based on genetic algorithm andsvm. In20168thInternationalSymposiumon Telecommunications (IST), pages 139–144. IEEE, 2016.

[29] Aditi Nema, Basant Tiwari, and Vivek Tiwari. Improving accuracy for intrusion detection through layered approach using support vector machine with feature reduction. In Proceedings of the ACM Symposium on Women in Research 2016, pages 26–31. ACM, 2016.
[30] Peiying Tao, Zhe Sun, and Zhixin Sun. An improved intrusion detection algorithm based on ga and svm. IEEE Access, 6:13624–13631, 2018.

[31] Praneeth Nskh, M Naveen Varma, and Roshan Ramakrishna Naik. Principle component analysis based intrusion detection system using support vector machine. In 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pages 1344–1350. IEEE, 2016.

[32] Angela Denise Landress. A hybrid approach to reducing the false positive rate in unsupervised machine learning intrusion detection. In SoutheastCon 2016, pages 1–6. IEEE, 2016.

[33] Tahir Mehmood and Helmi B Md Rais. Svm for network anomaly detection using aco feature subset. In 2015 International symposium on mathematical sciences and computing research (iSMSC), pages 121–126. IEEE, 2015.

[34] Wang Xingzhu. Aco and svm selection feature weighting of network intrusion detection method. International Journal of Security and its Applications, 9(4):129–270, 2015.

[35] Fatemeh Amiri, MohammadMahdi Rezaei Yousefi, Caro Lucas, Azadeh Shakery, and Nasser Yazdani. Mutual information-based feature selection for intrusion detection systems. Journal of Network and Computer Applications, 34(4):1184– 1199, 2011.

[36] Ujwala Ravale, Nilesh Marathe, and Puja Padiya. Feature selection based hybrid anomaly intrusion detection system using k means and rbf kernel function. Procedia Computer Science, 45:428–435, 2015.

[37] Sandhya Peddabachigari, Ajith Abraham, Crina Grosan, and Johnson Thomas. Modeling intrusion detection system using hybrid intelligent systems. Journal of network and computer applications, 30(1):114–132, 2007.

[38] Tharmini Janarthanan and Shahrzad Zargari. Feature selection in unsw-nb15 and kddcup’99 datasets. In 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE), pages 1881–1886. IEEE, 2017.

[39] Taimur Bakhshi and Bogdan Ghita. On internet traffic classification: A two-phased machine learning approach. Journal of Computer Networks and Communications, 2016, 2016.

[40] J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986.

[41] Rokach Lior et al. Data mining with decision trees: theory and applications, volume 81. World scientific, 2014.

[42] Max Kuhn and Kjell Johnson. Applied predictive modeling, volume 26. Springer, 2013.

[43] Christopher JC Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2):121–167, 1998.

[44] Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152. ACM, 1992.

[45] Johan AK Suykens and Joos Vandewalle. Least squaressupportvectormachineclassifiers. Neural processing letters, 9(3):293–300, 1999.

[46] M Lincoln. Kdd cup 99. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999. [47] Nour Moustafa and Jill Slay. The significant features of the unsw-nb15 and the kdd99 data sets for network intrusion detection systems. In 2015 4th international workshop on building analysis datasets and gathering experience returns for security (BADGERS), pages 25–31. IEEE, 2015.

[48] J. S Nour Moustafa. The unsw-nb15 data set. https://www.unsw.adfa.edu.au/unsw-canberracyber/cybersecurity/ADFA-NB15-Datasets/, 2015.

[49] K. L University. Ls-svm lab toolbox. https://www.esat.kuleuven.be/sista/lssvmlab/.

[50] Ramandeep Kaur and Meenakshi Bansal. Multidimensional attacks classification based on genetic algorithm and svm. In 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pages 561–565. IEEE, 2016.