Document Type : Research Article


1 Faculty of Electrical and Computer Engineering Malek-Ashtar University of Technology, Iran

2 Faculty of Electrical and Computer Engineering, Malek-Ashtar University of Technology, Iran


Web application firewalls (WAFs) are used for protecting web applications from attacks such as SQL injection, cross-site request forgery, and cross-site scripting. As a result of the growing complexity of web attacks, WAFs need to be tested and updated on a regular basis. There are various tools and techniques to verify the correct performance of WAFs but most of them are manual or use brute-force attacks, so suffer from poor efficacy. In this work, we propose a solution based on Reinforcement Learning (RL) to discover malicious payloads, which can bypass WAFs. We provide an RL framework with an environment compatible with OpenAI gym toolset standards. This environment is employed for training agents to implement WAF circumvention tasks. The agent mutates a malicious payload syntax using a set of modification operators as actions, without changes to its semantic. Then, upon WAF's reaction to the payload, the environment ascertains a reward for the agent. Eventually, based on the rewards, the agent learns a suitable sequence of mutations for any malicious payload. The payloads, which bypass the WAF can determine rules defects, which can be further used in rule tuning for rule-based WAFs. Also, it can enrich the machine learning-based datasets for retraining. We use Q-learning, advantage actor-critic (A2C), and proximal policy optimization (PPO) algorithms with the deep neural network. Our solution is successful in evading signature-based and machine learning-based WAFs. While we focus on SQL injection in this work, the method can be simply extended to use for any string-based injection attacks.


[1] Web application firewall. Accessed: 2021-12-24.
[2] Ali Moradi Vartouni, Mohammad Teshnehlab, and Saeed Sedighian Kashi. Leveraging deep neural networks for anomaly-based web application firewall. IET Information Security, 13(4):352–361, 2019.
[3] Mojtaba Hemmati and Mohammad Ali Hadavi. Using deep reinforcement learning to evade web application firewalls. In 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC), pages 35–41. IEEE, 2021.
[4] Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence, pages 43–58, 2011.
[5] Guillermo Caminero, Manuel Lopez-Martin, and Belen Carro. Adversarial environment reinforcement learning algorithm for intrusion detection. Computer Networks, 159:96–109, 2019.
[6] Bhagyashree Deokar and Ambarish Hazarnis. Intrusion detection system using log files and reinforcement learning. International Journal of Computer Applications, 45(19):28–35, 2012.
[7] Di Wu, Binxing Fang, Junnan Wang, Qixu Liu, and Xiang Cui. Evading machine learning botnet detection models via deep reinforcement learning. In ICC 2019-2019 IEEE International Conference on Communications (ICC), pages 1–6. IEEE, 2019.
[8] Hyrum S Anderson, Anant Kharkar, Bobby Filar, David Evans, and Phil Roth. Learning to evade static pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917, 2018.
[9] Zhiyang Fang, Junfeng Wang, Boya Li, Siqi Wu, Yingjie Zhou, and Haiying Huang. Evading antimalware engines with deep reinforcement learning. IEEE Access, 7:48867–48879, 2019.
[10] Konstantin Pozdniakov, Eduardo Alonso, Vladimir Stankovic, Kimberly Tam, and Kevin Jones. Smart security audit: reinforcement learning with a deep neural network approximator. In 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pages 1–8. IEEE, 2020.
[11] Fabio Massimo Zennaro and Laszlo Erdodi. Modeling penetration testing with reinforcement learning using capture-the-flag challenges: tradeoffs between model-free learning and a priori knowledge. arXiv preprint arXiv:2005.12632, 2020.
[12] Mohamed C Ghanem and Thomas M Chen. Reinforcement learning for efficient network penetration testing. Information, 11(1):6, 2019.
[13] L´aszl´o Erd˝odi, ˚Avald ˚Aslaugson Sommervoll, and Fabio Massimo Zennaro. Simulating sql injection vulnerability exploitation using q-learning reinforcement learning agents. Journal of Information Security and Applications, 61:102903, 2021.
[14] Luca Demetrio, Andrea Valenza, Gabriele Costa, and Giovanni Lagorio. Waf-a-mole: evading web application firewalls through adversarial machine learning. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, pages 1745– 1752, 2020.
[15] Dennis Appelt, Cu D Nguyen, Annibale Panichella, and Lionel C Briand. A machinelearning-driven evolutionary approach for testing web application firewalls. IEEE Transactions on Reliability, 67(3):733–757, 2018.
[16] H. Hu X. Wang. Evading web application firewalls with reinforcement learning. Accessed: 2021-12-24.
[17] Dennis Appelt, Cu D Nguyen, and Lionel Briand. Behind an application firewall, are we safe from sql injection attacks? In 2015 IEEE 8th international conference on software testing, verification and validation (ICST), pages 1–10. IEEE, 2015.
[18] Gym-waf. sanebow/gymwaf. Accessed: 2021-12-24.
[19] Libinjection. client9/libinjection. Accessed: 2021-12-24.
[20] Modsecurity-nginx. SpiderLabs/ModSecurity-nginx. Accessed: 2021-12-24.
[21] Coreruleset. coreruleset/coreruleset. Accessed: 2021-12-24.
[22] A03:2021 – injection. Top10/A03_2021-Injection/. Accessed: 2021-12-24.
[23] P Bojanowski. Grave e joulin a mikolov t. Enriching word vectors with subword information TACL, 5:135–146, 2017.
[24] Naxsi. naxsi. Accessed: 2021-12-24.
[25] Kevin Boone. Utf-8 and the problem of over-long characters.
overlong.html?i=1. Accessed: 2021-12-24.
[26] Richard S Sutton and Andrew G Barto. Reinformcent learning: An introduction, 1998.
[27] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
[28] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double qlearning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
[29] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
[30] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937.PMLR, 2016.
[31] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[32] Or Rivlin. Reinforcement learning with exploration by random network distillation. reinforcementlearning-with-exploration-by-randomnetwork-distillation-a3e412004402. Accessed: 2021-12-24.
[33] Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. arXiv preprint
arXiv:1810.12894, 2018.
[34] nxutil. Accessed: 2021-12-24.
[35] Waf-brain. Accessed: 2021-12-24.
[36] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323. JMLR Workshop and Conference
Proceedings, 2011.
[37] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[38] Lihong Li. A perspective on off-policy evaluation in reinforcement learning. Frontiers of Computer Science, 13(5):911–912, 2019.