Web application firewall. https://owasp.org/www-community/Web_Application_Firewall
. Accessed: 2021-12-24.
 Ali Moradi Vartouni, Mohammad Teshnehlab, and Saeed Sedighian Kashi. Leveraging deep neural networks for anomaly-based web application firewall. IET Information Security, 13(4):352–361, 2019.
 Mojtaba Hemmati and Mohammad Ali Hadavi. Using deep reinforcement learning to evade web application firewalls. In 2021 18th International ISC Conference on Information Security and Cryptology (ISCISC), pages 35–41. IEEE, 2021.
 Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence, pages 43–58, 2011.
 Guillermo Caminero, Manuel Lopez-Martin, and Belen Carro. Adversarial environment reinforcement learning algorithm for intrusion detection. Computer Networks, 159:96–109, 2019.
 Bhagyashree Deokar and Ambarish Hazarnis. Intrusion detection system using log files and reinforcement learning. International Journal of Computer Applications, 45(19):28–35, 2012.
 Di Wu, Binxing Fang, Junnan Wang, Qixu Liu, and Xiang Cui. Evading machine learning botnet detection models via deep reinforcement learning. In ICC 2019-2019 IEEE International Conference on Communications (ICC), pages 1–6. IEEE, 2019.
 Hyrum S Anderson, Anant Kharkar, Bobby Filar, David Evans, and Phil Roth. Learning to evade static pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917, 2018.
 Zhiyang Fang, Junfeng Wang, Boya Li, Siqi Wu, Yingjie Zhou, and Haiying Huang. Evading antimalware engines with deep reinforcement learning. IEEE Access, 7:48867–48879, 2019.
 Konstantin Pozdniakov, Eduardo Alonso, Vladimir Stankovic, Kimberly Tam, and Kevin Jones. Smart security audit: reinforcement learning with a deep neural network approximator. In 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pages 1–8. IEEE, 2020.
 Fabio Massimo Zennaro and Laszlo Erdodi. Modeling penetration testing with reinforcement learning using capture-the-flag challenges: tradeoffs between model-free learning and a priori knowledge. arXiv preprint arXiv:2005.12632, 2020.
 Mohamed C Ghanem and Thomas M Chen. Reinforcement learning for efficient network penetration testing. Information, 11(1):6, 2019.
 L´aszl´o Erd˝odi, ˚Avald ˚Aslaugson Sommervoll, and Fabio Massimo Zennaro. Simulating sql injection vulnerability exploitation using q-learning reinforcement learning agents. Journal of Information Security and Applications, 61:102903, 2021.
 Luca Demetrio, Andrea Valenza, Gabriele Costa, and Giovanni Lagorio. Waf-a-mole: evading web application firewalls through adversarial machine learning. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, pages 1745– 1752, 2020.
 Dennis Appelt, Cu D Nguyen, Annibale Panichella, and Lionel C Briand. A machinelearning-driven evolutionary approach for testing web application firewalls. IEEE Transactions on Reliability, 67(3):733–757, 2018.
 H. Hu X. Wang. Evading web application firewalls with reinforcement learning. https://openreview.net/pdf?id=m5AntlhJ7Z5. Accessed: 2021-12-24.
 Dennis Appelt, Cu D Nguyen, and Lionel Briand. Behind an application firewall, are we safe from sql injection attacks? In 2015 IEEE 8th international conference on software testing, verification and validation (ICST), pages 1–10. IEEE, 2015.
 Gym-waf. https://github.com/
sanebow/gymwaf. Accessed: 2021-12-24.
 Libinjection. https://github.com/
client9/libinjection. Accessed: 2021-12-24.
 Modsecurity-nginx. https://github.com/
SpiderLabs/ModSecurity-nginx. Accessed: 2021-12-24.
 Coreruleset. https://github.com/
coreruleset/coreruleset. Accessed: 2021-12-24.
 A03:2021 – injection. https://owasp.org/
Top10/A03_2021-Injection/. Accessed: 2021-12-24.
 P Bojanowski. Grave e joulin a mikolov t. Enriching word vectors with subword information TACL, 5:135–146, 2017.
 Naxsi. https://github.com/nbs-system/
naxsi. Accessed: 2021-12-24.
 Kevin Boone. Utf-8 and the problem of over-long characters. https://kevinboone.me/
overlong.html?i=1. Accessed: 2021-12-24.
 Richard S Sutton and Andrew G Barto. Reinformcent learning: An introduction, 1998.
 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
 Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double qlearning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
 Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
 Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937.PMLR, 2016.
 John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
 Or Rivlin. Reinforcement learning with exploration by random network distillation. https://towardsdatascience.com/
reinforcementlearning-with-exploration-by-randomnetwork-distillation-a3e412004402. Accessed: 2021-12-24.
 Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. arXiv preprint
 nxutil. https://github.com/prajal/nxutil
. Accessed: 2021-12-24.
 Waf-brain. https://github.com/BBVA/wafbrain. Accessed: 2021-12-24.
 Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323. JMLR Workshop and Conference
 Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Lihong Li. A perspective on off-policy evaluation in reinforcement learning. Frontiers of Computer Science, 13(5):911–912, 2019.