Phishing website detection using weighted feature line embedding

Document Type: ORIGINAL RESEARCH PAPER

Authors

1 Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran

2 Faculty of Information Technology Engineering, Tarbiat Modares University, Tehran, Iran

Abstract

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. Moreover, among the available training samples, there are abnormal samples that cause classification error. For instance, it is possible that there are phishing samples with similar features to legitimate ones and vice versa. A supervised feature extraction method, called weighted feature line embedding, is proposed in this paper to solve these problems. The proposed method virtually generates training samples by utilizing the feature line metric. Hence, it can solve the small sample size problem. Moreover, by assigning appropriate weights to each pair of feature points, it corrects the undesirable quality of abnormal samples. The features extracted by our method improve the performance of phishing website detection specially by using small training sets.

Keywords


[1] G. Ramesh, I. Krishnamurthi, K. Sampath Sree Kumar, An efficacious method for detecting phishing webpages through target domain identification, Decision Support Systems 61 (2014)12-22.
[2] R. Gowtham, I. Krishnamurthi, A comprehensive and efficacious architecture for detecting phishing webpages, computers & security 40 (2014 ) 23-3 7.
[3] E.-S. M. El-Alfy, A. A. AlHasan, Spam filtering framework for multimodal mobile communication based on dendritic cell algorithm, Future Generation Computer Systems 64 (2016) 98-107.
[4] A. Abbasi, Z. Zhang, D. Zimbra, H. Chen, Detecting fake Websites: the contribution of statistical learning theory, MIS Q., 34 (3) (2010), 1-28.
[5] OpenDNS Phishing Quiz. https://www.opendns.com/phishing-quiz/, 2016 (accessed 19.10.16).
[6] APWG Phishing Attack Trends Reports, Retrieved April 21, 2015.
[7] R. M. Mohammad, F. Thabtah, L. McCluskey,Predicting phishing websites based on selfstructuring neural network, Neural Comput & Applic., 2013. DOI 10.1007/s00521-013-1490-z.
[8] N. Abdelhamid, A. Ayesh, F. Thabtah, Phishing detection based Associative Classification datamining, Expert Systems with Applications, 41(2014) 5948-5959.
[9] Y. Li, L. Yang, J. Ding, A minimum enclosing ball-based support vector machine approach for
detection of phishing websites, Optik, 127 (2016)345-351.
[10] X. Chen, I. Bose, A. Chung Man Leung, C. Guo,Assessing the severity of phishing attacks: A hybrid data mining approach, Decision Support Systems 50 (2011) 662-672.
[11] M. Aburrous, M.A. Hossain, K. Dahal, F. Thabtah, Intelligent phishing detection system for ebanking using fuzzy data mining, Expert Systems with Applications 37 (2010) 7913-7921.
[12] V. Ramanathan, H. Wechsler, Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation, computers & security 34 ( 2013 ) 123-139.
[13] N. Abdelhamid, Multi-label rules for phishing classification, Applied Computing and Informatics, Applied Computing and Informatics 11(2015)29-46.
[14] W. Hadi, F. Aburub, S. Alhawari, A new fast associative classification algorithm for detecting phishing websites, Applied Soft Computing 48(2016) 729-734.
[15] G. A. Montazer, S. ArabYarmohammadi, Detection of Phishing Attacks in Iranian E-banking Using a Fuzzy-Rough Hybrid System, Applied Soft Computing, 35 (2015) 482-492.
[16] P.A. Barraclough, M.A. Hossain, M.A. Tahir, G. Sexton, N. Aslam, Intelligent phishing detection and protection scheme for online transactions, Expert Systems with Applications 40(2013) 4697-4706.
[17] D. Zhu, G. Premkumar, X. Zhang, C.-.H. Chu, Data mining for network intrusion detection: a comparison of alternative methods, Decision Sciences 32 (4) (2001) 635-660.
[18] M. Imani, H. Ghassemian, Attribute Profile Based Feature Space Discriminant Analysis for Spectral-Spatial Classification of Hyperspectral Images, Computers and Electrical Engineering,2016, In Press.
[19] J. Lu, G. Wang, W. Deng and K. Jia, Reconstruction-Based Metric Learning for Unconstrained Face Verification, IEEE Transactions on Information Forensics and Security, 10(1) (2015) 79-89.
[20] N. Martinel, C. Micheloni and G. L. Foresti, Kernelized Saliency-Based Person Re-Identification Through Multiple Metric Learning, IEEE Transactions on Image Processing, 24 (12) (2015) 5645-5658.
[21] H. Wang, L. Feng, J. Zhang and Y. Liu, Semantic Discriminative Metric Learning for Image Similarity Measurement, IEEE Transactions on Multimedia, 18 (8) (2016) 1579-1589.
[22] Q. Zhang, L. Zhang, Y. Yang, Y. Tian and L.Weng, Local Patch Discriminative Metric Learning for Hyperspectral Image Feature Extraction,IEEE Geoscience and Remote Sensing Letters,11 (3) (2014) 612-616.
[23] J. Lu, G. Wang and P. Moulin, Localized Multifeature Metric Learning for Image-Set-Based Face Recognition, IEEE Transactions on Circuits and Systems for Video Technology, 26 (3)(2016) 529-540.
[24] Y. Wang et al., Learning a Discriminative Distance Metric With Label Consistency for Scene Classification, IEEE Transactions on Geoscience and Remote Sensing, 55 (8) (2017) 4427-4440.
[25] K. Fukunaga, Introduction to Statistical Pattern Recognition, San Diego: Academic Press Inc, 1990.

[26] J.-G. Wang, E. Sung,W.-Y. Yau, Incremental two-dimensional linear discriminant analysis with applications to face recognition, Journal of Network and Computer Applications, 33 (2010)314-322.
[27] X. F. He, P. Niyogi, Locality preserving projections, in Proc. Adv. Neural Inf. Process. Syst. 16(2004) 153-160.
[28] Y.-L. Chang, J.-N. Liu, C.-C. Han, Y.-N. Chen,Hyperspectral Image Classification Using Nearest Feature Line Embedding Approach, IEEE Trans. Geoscience and remote sensing, 52 (1)(2014) 278-287.
[29] S. Z. Li, J. Lu, Face recognition using the nearest feature line method, IEEE Trans. Neural Netw.,10 (2) (1999) 439-433.
[30] Y. W. Pang, Y. Yuan, X. Li, Generalized nearest feature line for subspace learning, Electron. Lett.,43 (20) (2007) 1079-1080.
[31] J. Lu, Y. P. Tan, Uncorrelated discriminant nearest feature line analysis for face recognition, IEEE Signal Process. Lett., 17 (2) (2010) 185-188.
[32] W.-H. Yang, D.-Q. Dai, Two-Dimensional Maximum Margin Feature Extraction for Face Recognition, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 39 (4) (2009)1002-1012.
[33] Y.-L. Chang, A simulated annealing feature extraction approach for hyperspectral images, Future Generation Computer Systems 27 (2011)419-426.
[34] B. C. Kuo and D. A. Landgrebe, Nonparametric weighted feature extraction for classification, IEEE Trans. Geosci. Remote Sens, 42 (5) (2004)1096-1105.
[35] R. M. Mohammad, L. McCluskey, F. Thabtah,UCI Machine Learning Repository: Phishing Websites Data Set. http://archive.ics.uci.edu/ml/datasets/Phishing+Websites#, 2015(accessed 2.10.16).