Classification of encrypted traffic for applications based on statistical features

Document Type: ORIGINAL RESEARCH PAPER

Authors

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran

Abstract

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applications make these features imperfect for such tasks. As a remedy, network traffic classification using machine learning techniques is now evolving. In this article, a new semi-supervised learning is proposed which utilizes clustering algorithms and label propagation techniques. The clustering part is based on graph theory and minimum spanning tree (MST) algorithm. In the next level, some pivot data instances are selected for the expert to vote for their classes, and the identified class labels will be used for similar data instances with no labels. In the last part, the decision tree algorithm is used to construct the classification model. The results show that the proposed method has a precise and accurate performance in classification of encrypted traffic for the network applications. It also provides desirable results for plain un-encrypted traffic classification, especially for unbalanced streams of data.

Keywords


[1] A. Madhukar and C. Williamson, “A longitudinal study of P2P traffic classification,” in 14th IEEE International Symposium on Modeling, Analysis,and Simulation of Computer and Telecommunication Systems, 2006.
[2] A. Callado and et al., “A survey on internet traffic identification,” Communications Surveys & Tutorials IEEE, vol. 11, pp. 37-52, 2009.
[3] T. Nguyen and G. Armitage, “A survey of techniques for internet traffic classification using machine learning,” Communications Surveys & Tutorials,IEEE, vol. 10, pp. 56-76, 2008.
[4] A. Dainotti, A. Pescape and K. C. Claffy, “Issues and future directions in traffic classification,” IEEE Network, vol. 26, no. 1, pp. 35-40, 2012.
[5] A. W. Moore and D. Zuev, “Internet Traffic Classification Using Bayesian Analysis Techniques,”
SIGMETRICS Perform. Eval. Rev., vol. 33, no.1, pp. 50-60, 2005.
[6] R. Alshammari and A. N. Zincir-Heywood, “Can encrypted traffic be identified without port numbers,
IP addresses and payload inspection?,”Computer networks, vol. 55, pp. 1326-1350, 2011.
[7] C. Zigang, C. Shoufeng, X. Gang and G. Li,“Progress in Study of Encrypted Traffic Classification,”
in Trustworthy Computing and Services:International Conference, Beijing, 2013.
[8] Z. Meng, H. Zhang, B. Zhang and G. Lu, “Encrypted Traffic Classification Based on an Improved
Clustering Algorithm,” in Trustworthy Computing and Services: International Conference,Beijing, 2012.
[9] J. Erman, A. Mahanti, M. Arlitt and L. Cohen,“Offline/realtime traffic classification using semisupervised
learning,” Performance Evaluation,vol. 64, no. 9-12, p. 1194âAS1213, 2007.
[10] “SSH,” [Online]. Available: http://www.rfcarchive.org/getrfc.php?rfc=4251.
[11] C. Chao, J. Zhang, Y. Xiang, W. Zhou and Y.Xiang, “Internet traffic classification by aggregating
correlated naive bayes predictions,” IEEE Transactions on Information Forensics and Security,vol. 8, no. 1, pp. 5-15, 2013.
[12] N. Williams, S. Zander and G. Armitage, “A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification,” SIGCOMM Comput.Commun. Rev., vol. 36, no. 5, pp. 5-16, 2006.
[13] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M.Faloutsos and K. Lee, “Internet Traffic Classification
Demystified: Myths, Caveats, and the Best Practices,” in CoNEXT ’08, New York, 2008.
[14] M. Lotfollahi, R. S. Hossein Zade, M. Jafari Siavoshani and M. Saberian, “Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning,” eprint arXiv, vol.1709.02656, no. 2, 2017.
[15] S. Bagui, X. Fang, K. Ezhil, S. C. Bagui and J.Sheehan, “Comparison of machine-learning algorithms
for classification of VPN network using time-related features,” Journal of Cyber Security Technology, vol. 1, no. 2, pp. 108-126, 2017.
[16] A. McGregor, M. Hall, P. Lorier and J. Brunskill,“Flow Clustering Using Machine Learning Techniques,” in Passive and Active Network Measurement: 5th International Workshop, Berlin,Heidelberg, Springer Berlin Heidelberg, 2004, pp.205-214.
[17] L. Bernaille, R. Teixeira and K. Salamatian, “Early Application Identification,” in Proceedings of the 2006 ACM CoNEXT Conference, New York, NY, USA, 2006.

[18] Z. Jun, X. Yang, Z. Wanlei and W. Yu, “Unsupervised traffic classification using flow statistical properties and IP packet payload,” Journal of Computer and System Sciences, vol. 79, no. 5,pp. 573-585, 2013.
[19] A. Shrivastav and A. Tiwari, “Network traffic classification using semi-supervised approach,” in Machine Learning and Computing (ICMLC),Bangalore, 2010.
[20] Y. Wang, Y. Xiang, J. Zhang and S. Yu, “A novel semi-supervised approach for network traffic clustering,” in 5th International Conference on Network and System Security (NSS), Milan,2011.
[21] C. T. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters,” IEEE Transactions on Computers, Vols. C-20, no. 1,pp. 68-86, 1971.
[22] C. Zhong and et al., “A graph-theoretical clustering method based on two rounds of minimum spanning trees,” Pattern Recognition, vol. 43, pp.752-766, 2010.
[23] “NLANR,” [Online]. Available: http://pma.nlanr.net.
[24] “MAWI,” [Online]. Available: http://mawi.wide.ad.jp/mawi/.
[25] “DARPA 1999 intrusion detection evaluation data,” [Online]. Available: https://www.ll.mit.edu/ideval/data/. “BRASIL,” [Online]. Available:
[26] https://www.cl.cam.ac.uk/research/srg/netos/projects/brasil/.