DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

Document Type: ORIGINAL RESEARCH PAPER

Authors

Abstract

To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In this paper, a dynamic approach is proposed in order to extract features from binaries. The run-time behavior of the binary files were found and recorded using a homemade tool that provides a controlled environment. The approach based on DyVSoR assumes that the run-time behavior of each binary can be represented by the values of registers. A method to compute the similarity between two binaries based on the value sets of the registers is presented. Hence, the values are traced before and after invoked API calls in each binary and mapped to some vectors. To detect an unknown file, it is enough to compare it with dataset binaries by computing the distance between registers, content of this file and all binaries. This method could detect malicious samples with 96.1% accuracy and 4% false positive rate. The list of execution traces and the dataset are reachable at: http://home.shirazu.ac.ir/˷ sami/malware

Keywords


[1] M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant, "Semantics-Aware Malware Detection," IEEE Symposium on Security and Privacy (S&P05), Washington. DC. USA, pp. 32-46, 2005.

[2] Symantec Corp, "Symantec Global Internet Security Threat Report," Vol. 7, 2008.

[3] PandaLabs, "Pandalabs annual malware report 2009," 2010.

[4] K. Kim, and B. R. Moon, "Malware detection based on dependency graph using hybrid genetic algorithm," Proceedings of the 12th Annual Conf. on Genetic and Evolutionary Computation, ACM. USA. , pp. 1211-1218, July 2010.

[5] McAfee Labs, "McAfee Threats Report: Fourth Quarter 2010," McAfee Inc., Santa Clara. California, 2010.

[6] X. Hu, "Large-Scale Malware Analysis, Detection, and Signature Generation," A dissertation for the degree of Doctor of Philosophy, University of Michigan, Ann Arbor. Michigan. United States, 2011.

[7] P. Wood, M. Nisbet, G. Egan, N. Johnston, K. Haley, B. Krishnappa, T. K. Tran, I. Asrar, O. Cox, S. Hittel, et al., "Symantec Internet Security Threat Report Trends for 2011," Vol. 17, Symantec Corporation, 2012.

[8] PandaLabs, "Pandalabs annual malware report 2011," 2012.

[9] Panda Security, "PandaLabs Annual Report 2012," 2013.

[10] Sophos, "Security threat report 2013 New Platforms and Changing Threats," Sophos Ltd., Boston, USA, 2013.

[11] Macafee Labs, "McAfee Threats Report: Fourth Quarter 2012," McAfee Inc, 2013.

[12] Symantec Corporation, "Internet Security Threat Report 2013," Vol. 18, 2013.

[13] Sophos, "Security threat report 2011," Sophos Ltd., Boston, USA, January 2011.

[14] Symantec Corporation, "The Shamoon Attacks," [On-line]. Available electronically at http://www.symantec.com/connect/blogs/shamoon-attacks. 2012.

[15] Norton by Symantec, "2012 Norton Cybercrime Report," 2012.

[16] A. E. Ammar, A. M. Mohd, and H. Ahmed, "Malware Detection Based on Hybrid Signature Behaviors Application Programming Interface Call Graph," American J of Applied Sciences, United States, vol. 3, pp. 283-288, 2012.

[17] L. Bohne, "Pandoras Bochs: Automatic Unpacking of Malware," Diploma Thesis, University of Mannheim, January 2008.

[18] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, "A survey on automated dynamic malware analysis techniques and tools," ACM Computing Surveys (CSUR) J., Vol. 44, ACM. New York. USA, pp. 1-49, February 2012.

[19] S. M. Abdulalla, L. M. Kiah, and O. Zakariam, "A biological Model to Improve PE Malware Detection: Review," Int. J. of Physical Sciences, vol. 5, pp. 2236-2247, 2010.

[20] K. M. Goertzel, "Tools on Anti Malware," Technical Information Center, 2009.

[21] Li. Shengying, "A survey on tools for binary code analysis," Stony Brook University, August 2004.

[22] M. Bailey, J. Oberheide, J. Andersen, Z. Mao, F. Jahanian, and J. Nazario, "Automated Classification and Analysis of Internet Malware," In Proceedings of Symposium on Recent Advances in Intrusion Detection (RAID07), pp. 178-197, 2007.

[23] M. Yahyazadeh, and M. Abadi, "BotOnus: An Online Unsupervised Method for Botnet Detection," The ISC Int. J. of Information Security (ISeCure), vol. 4, pp. 51-62, January 2012.

[24] P. Li, L. Liu, D. Gao, and M. K. Reiter, "On challenges in evaluating malware clustering," In Proceedings of the 13th Int. Conf. on Recent advances in intrusion detection (RAID10), Berlin. Heidelberg, pp. 238-255, 2010.

[25] Z. Salehi, M. Ghiasi, and A. Sami, "Malware Detection Preserving API Function Calls and Their Standard Function Calling Notation," In

Proceeding of 16th CSI Symposium on Artificial Intelligence and Signal Processing (AISP 2012), Shiraz, Iran, 2012.

[26] H. Zhao, M. Xu, N. Zheng, J. Yao, and Q. Ho, "Malicious executable classification based on behavioral factor analysis," In Proceeding Int. Conf. on e-Education, e-Business, e-Management and e-Learning (IC4E 2010), Sanya. China, pp. 502- 506, 2010.

[27] F. Ahmed, H. Hameed, M. Z. Shafiq, and M. Farooq, "Using spatiotemporal information in api calls with machine learning algorithms for malware detection," In Proceeding Second ACM workshop on Security and artificial intelligence (AISec 09), New York, USA, pp. 55-62, 2009.

[28] R. Tian, R. Islam, and L. Batten, "Differentiating Malware from Cleanware Using Behavioral Analysis," In Proceeding Fifth Int. Conf. on Malicious and Unwanted Soft-ware (MALWARE 2010), Nancy, France, pp. 23-30, 2010.

[29] F. Leder, B. Steinbock, and P. Martini, "Classification and detection of metamorphic malware using value set analysis," In Proceeding Fourth Int. Conf. on Malicious and Unwanted Software (MALWARE 2009), pp. 39-46, 2009.

[30] V. S. Sathyanarayan, P. Kohli, and B. Bruhadeshwar, "Signature Generation and Detection of Malware Families," In Information Security and Privacy 13th Australasian Conf. (ACISP 2008),Wollongong, Australia, pp. 336-349, July 2008.

[31] R. Moskovitch, D. Stopel, C. Feher, N. Nissim, and Y. Elovici, "Unknown Malcode Detection via Text Categorization and the Imbalance Problem," Intelligence and Security Informatics (ISI 2008), Taipei. Taiwan, pp. 156-181, 2008.

[32] I. Santos, F. Brezo, J. Nieves, Y. K. Penya, B. Sanz, C. Laorden, and P. G. Bringas, "Idea: Opcode-sequence-based malware detection," In Engineering Secure Software and Systems Second Int. Symposium (ESSoS 2010), Pisa. Italy, pp. 35-43, February 2010.

[33] R. Tian, L. M. Batten, and S. C. Versteeg, "Function Length as a Tool for Malware Classification," In Proceedings of the 3rd Int. Conf. on Malicious and Unwanted Software (Malware 2008), pp. 69-76, 2008.

[34] R. Tian, L. Batten, R. Islam, and S. Versteeg, "an Automated Classification System based on the Strings of Trojan and Virus Families," In Proceedings of the 4th Int. Conf. on Malicious and Unwanted Software (MALWARE 2009), Quebec. Canada, pp. 23-30, October 2009.

[35] Y. Ye, T. Li, Q. Jiang, and Y. Wang, "CIMDS: Adapting Post processing Techniques of Associative Classification for Malware Detection," IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 40, pp. 298-307, May 2010.

[36] A. Sami, B. Yadegari, H. Rahimi, N. Peiravian, S. Hashemi, and A. Hamze, "Malware detection based on mining API calls," In Proceedings of ACM Symposium on Applied Computing (SAC 10), Switzerland, pp. 1020-1025, March 2010.

[37] G. Tahan, L. Rokach, and Y. Shahar, "Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features," The J. of Machine Learning Research, Vol. 13, pp. 949-979, 2012.

[38] M. K. Shankarapani, S. Ramamoorthy, R. S. Movva, and S. Mukkamala, "Malware detection using assembly and API call sequences," J. in Computer Virology, Vol. 7, pp. 107-119, 2010.

[39] P. M. Comparetti, G. Salvaneschi, E. Kirda, C. Kolbitsch, C. Kruegel, and S. Zanero, "Identifying Dormant Functionality in Malware Programs," IEEE Symposium on Security and Privacy (S&P 2010), Berleley/Oakland. California. USA, pp. 61-76, May 2010.

[40] M. Christodorescu, S. Jha, and C. Kruegel, "Mining specifications of malicious behavior," Foundations of Software Engineering, pp. 1-10, 2007.

[41] L. Bai, J. Pang, Y. Zhang, W. Fu, and J. Zhu, "Detecting malicious behavior using critical API calling graph matching," Proceedings of the 1st Int. Conf. on Information Science and Engineering, Nanjing, pp. 1716-1719, 2009.

[42] H. Guo, J. Pang, Y. Zhang, F. Yue, and R. Zhao, "HERO: A novel malware detection framework based on binary translation," Proceedings of the IEEE Int. Conf. on Intelligent Computing and Intelligent Systems, Xiamen, pp. 411-415, 2010.

[43] Y. Park, D. Reeves, V. Mulukutla, and B. Sundaravel, "Fast malware classification by automated behavioral graph matching," Proceedings of the 6th Annual Workshop on Cyber Security and Information Intelligence Research, USA, 2010.

[44] F. Karbalaee, A. Sami, and M. Ahmadi, "Semantic Malware Detection by Deploying Graph Mining," Int. J. of Computer Science Issues (IJCSI 2012), Vol. 9, pp. 373-379, 2012.

[45] O. Kostakis, J. Kinable, H. Mahmoudi, and K. Mustonen, "Improved call graph comparison using simulated annealing," Proceedings of the 2011 ACM Symposium on Applied Computing, USA, pp. 1516-1523, 2011.

[46] Y. Park, and D. Reeves, "Deriving common malware behavior through graph clustering", Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, USA, pp. 497-502, 2011.

[47] M. Ahmadi, A. Sami, H. Rahimi, and B. Yadegari, "Iterative System Call Patterns Blow the Malware Cover," IT Security for The Next Generation, Asia Pacific & MEA Cup 2011, Malaysia, March 2011.

[48] G. Wagener, R. State, and A. Dulaunoy, "Malware behavior analysis," J. in Computer Virology, Vol. 4, pp. 279-287, 2008.

[49] U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda, "Scalable, Behavior Based Malware Clustering," Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS'09), San Diego, February 2009.

[50] U. Bayer, E. Kirda, and C. Kruegel, "Improving the Efficiency of Dynamic Malware Analysis," In Proceedings of the 2010 ACM Symposium on Applied Computing (SAC '10), NY, USA, pp. 1871-1878, 2010.

[51] J. Jang, D. Brumley, and S. Venkataraman, "Bit- Shred: Feature Hashing Malware for Scalable Triage and Semantic Analysis," Proceedings of the 18th ACM conf. on Computer and Communications Security, ACM, pp. 309-320, 2011.

[52] J. Hegedus, Y. Miche, A. Ilin, and A. Lendasse, "Methodology for Behavioral-based Malware Analysis and Detection Using Random Projections and K-Nearest Neighbors Classifiers," In Proceedings of the Seventh Int. Conf. on Computational Intelligence and Security, Sanya, Hainan, China, pp. 1016-1023, 2011.

[53] J. Potier, "WinAPIOverride32," 2013. [Online]. Available electronically at http://jacquelin.potier.free.fr/winapioverride32/.

[54] M. Fredrikson, S. Jha, M. Christodorescu, "Synthesizing Near-Optimal Malware Specification from Suspicious Behaviors," Proceeding 31st IEEE Symposium on Security and Privacy (S&P 2010), pp. 45-60, 2010.

[55] R. Kohavi, "A study of cross-validation and boot-strap for accuracy estimation and model selection," In Proceedings of the Fourteenth Int. Joint Conf. on Artificial Intelligence, pp. 1137-1143, 1995.

[56] L. Breiman, "Random Forests," Kluwer Academic Publishers. Manufactured in The Netherlands. 2001.

[57] T. Langerud, "PowerScan: A Framework for Dynamic Analysis and Anti-Virus Based Identification of Malware," Master thesis, Norwegian University of Science and Technology Department of Telematics, Nor-way, 2008.

[58] C. G. Weng, and J. Poon. "A New Evaluation Measure for Imbalanced Datasets," In Seventh Australasian Data Mining Conf. (AusDM 2008), pp. 27-32, 2008.

[59] A. Fog, "Function calling conventions," In Calling conventions for different C++ compilers and operating systems, Copenhagen, Denmark, 2012.

[60] J. Potier, "Where is located the return value?," [On-line]. Available electronically at http://jacquelin.potier.free.fr/winapioverride32/doc/faq.ht m#returnvalue, 2011.

[61] Intel Corporation, "Intel Itanium Processor specific Application Binary Interface (ABI) Intel," 2001.