Document Type : Research Article

Authors

Department of Computer Engineering, Tarbiat Modares University, Tehran, Iran

Abstract

With the widespread use of Android smartphones, the Android platform has become an attractive target for cybersecurity attackers and malware authors. Meanwhile, the growing emergence of zero-day malware has long been a major concern for cybersecurity researchers. This is because malware that has not been seen before often exhibits new or unknown behaviors, and there is no documented defense against it. In recent years, deep learning has become the dominant machine learning technique for malware detection and could achieve outstanding achievements. Currently, most deep malware detection
techniques are supervised in nature and require training on large datasets of benign and malicious samples. However, supervised techniques usually do not perform well against zero-day malware. Semi-supervised and unsupervised deep malware detection techniques have more potential to detect previously unseen malware. In this paper, we present MalGAE, a novel end-to-end deep malware detection technique that leverages one-class graph neural networks to detect Android malware in a semi-supervised manner. MalGAE represents each Android application with an attributed function call graph (AFCG) to benefit the ability of graphs to model complex relationships between data. It builds a deep one-class classifier by training a stacked graph autoencoder with graph convolutional layers on benign AFCGs. Experimental results show that MalGAE can achieve good detection performance in terms of different evaluation measures.

Keywords

[1] Federica Laricchia. Market share of mobile operating systems worldwide 2012-2022. https://www.statista.com/statistics/272698/, August 2022.
[2] AV-TEST. Malware statistics & trends report. https://www.av-test.org/en/statistics/malware, 2022.
[3] Asghar Tajoddin and Mahdi Abadi. RAMD: Registry-based anomaly malware detection using one-class ensemble classifiers. Applied Intelligence, 49(7):2641–2658, July 2019.
[4] Jun Zhang, Yang Xiang, Yu Wang, Wanlei Zhou, Yong Xiang, and Yong Guan. Network traffic classification using correlation information. IEEE Transactions on Parallel and Distributed Systems, 24(1):104–117, January 2013.
[5] Junyang Qiu, Jun Zhang, Wei Luo, Lei Pan, Surya Nepal, and Yang Xiang. A survey of Android malware detection with deep neural models. ACM Computing Surveys, 53(6):1–36, November 2021.
[6] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S.Yu. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1):4–24, January 2021.
[7] Jiaqi Yan, Guanhua Yan, and Dong Jin. Classifying malware represented as control flow graphs using deep graph convolutional neural network. In Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems
and Networks, pages 52–63, Portland, OR, USA, June 2019. IEEE.
[8] Peng Xu, Claudia Eckert, and Apostolis Zarras. Detecting and categorizing Android malware with graph neural networks. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, pages 409–412, Virtual Event, Republic of Korea, March 2021. ACM.
[9] Qian Li, Qingyuan Hu, Yong Qi, Saiyu Qi, Xinxing Liu, and Pengfei Gao. Semi-supervised twophase familial analysis of Android malware with normalized graph embedding. Knowledge-Based Systems, 218:106802, April 2021.
[10] Chunyan Zhang, Qinglei Zhou, Yizhao Huang, Ke Tang, Hairen Gui, and Fudong Liu. Automatic detection of Android malware via hybrid graph neural network. Wireless Communications and Mobile Computing, 2022:7245403, May 2022.
[11] Xinjun Pei, Long Yu, and Shengwei Tian. AMalNet: A deep learning framework based on graph convolutional networks for malware detection. Computers & Security, 93:101792, June 2020.
[12] Xiao-Wang Wu, Yan Wang, Yong Fang, and Peng Jia. Embedding vector generation based on function call graph for effective malware detection and classification. Neural Computing and Applications, 34(11):8643–8656, June 2022.
[13] Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, pages 729–734, Montreal, QC, Canada, July 2005.IEEE.
[14] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, January 2009.
[15] Han Gao, Shaoyin Cheng, and Weiming Zhang. GDroid: Android malware detection and classification with graph convolutional network. Computers & Security, 106:102264, July 2021.
[16] Yafei Wu, Jian Shi, Peicheng Wang, Dongrui Zeng, and Cong Sun. DeepCatra: Learning flowand graph-based behaviors for Android malware detection. arXiv preprint arXiv:2201.12876, pages 1–12, July 2022.
[17] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, pages 4438–4445, New Orleans, LA,
USA, February 2018. AAAI Press.
[18] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, pages 1–14, Toulon, France, April 2017.
[19] Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. Adversarially regularized graph autoencoder for graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages
2609–2615, Stockholm, Sweden, July 2018. AAAI Press.
[20] Yunhao Ge, Yunkui Pang, Linwei Li, and Laurent Itti. Graph autoencoder for graph compression and representation learning. In Proceedings of the 9th International Conference on Learning Representations, pages 1–9, Vienna, Austria,
May 2021. OpenReview.
[21] Christopher Jun-Wen Chew, Vimal Kumar, Panos Patros, and Robi Malik. ESCAPADE: Encryption-type-ransomware: System call based pattern detection. In Miros law Kuty lowski, Jun Zhang, and Chao Chen, editors, Network and System Security, Lecture Notes in Computer Science, pages 388–407. Springer International Publishing, Cham, Switzerland, 2020.
[22] Anthony Desnos. Androguard. https://github.com/androguard/androguard, 2022.
[23] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations, pages 1–18, New Orleans, LA, USA, May 2019.