Author = Amir Mahdi Sadeghzadeh Mesgar

EPT Benchmark: Evaluation of Persian Trustworthiness in Large Language Models

Articles in Press, Accepted Manuscript, Available Online from 01 January 2026

https://doi.org/10.22042/isecure.2026.242935

Mohammad Reza Mirbagheri, Seyed Mohammad Mahdi Mirkamali, Zahra Arani, Ali Javeri, Amir Mahdi Sadeghzadeh Mesgar, Rasool Jalili

Abstract Large Language Models (LLMs), trained on extensive datasets using advanced deeplearning architectures, have demonstrated remarkable performance across a wide range of language tasks, becoming a cornerstone of modern AI technologies. However, ensuring their trustworthiness remains a critical challenge, asreliability is essential not only for accurate performance but also for upholding ethical, cultural, and social values. Careful alignment of training data and culturally grounded evaluation criteria is vital for developing responsible AI systems. In this study, we introduce the EPT (Evaluation of Persian Trustworthiness) metric, a culturally informed benchmark specifically designed to assess the trustworthiness of LLMs across six key aspects: Truthfulness, Safety, Fairness, Robustness, privacy, and ethical alignment. We curated a labelled dataset and evaluated the performance of several leading models—including ChatGPT, Claude, DeepSeek, Gemini, Grok, LLaMA, Mistral, and Qwen—using both automated LLM-based and human assessments. Our results reveal significant deficiencies in the safety dimension, underscoring the urgent need for focused attention on this critical aspect of model behaviour. Furthermore, our findings offer valuable insights into the alignment of these models with Persian ethical-cultural values and highlight critical gaps and opportunities for advancing trustworthy and culturally responsible AI. The dataset is publicly available at: https://github.com/Rezamirbagheri110/EPT-Benchmark.

Divergent Twins Fencing: Protecting Deep Neural Networks Against Query-based Black-box Adversarial Attacks

Volume 17, Issue 2, July 2025, Pages 137-150

https://doi.org/10.22042/isecure.2025.216615

Elahe Farshadfar, Amir Mahdi Sadeghzadeh Mesgar, Rasool Jalili

Abstract Recent advances in Machine Learning and Deep Learning have significantly expanded their applications in various domains. The resource-intensive process of training deep neural networks, in terms of substantial labeled data acquisition and computational power, makes these models valuable intellectual property for organizations, hence rising an increasingly crucial need for securing them. A major security threat to deep neural networks is the adversarial examples problem, specifically the black-box type. In these attacks, adversaries generate inputs with often imperceptible crafted perturbations to deceive the model into incorrect classifications, all with no access to the model internals and solely by interacting with it via queries and responses. Among the two primary methods for creating black-box adversarial examples i.e. model extraction-based and query-based approaches, this research focuses on the query-based type, and it explores a novel defense mechanism to mitigate their success. Our proposed method called Divergent Twins Fencing (DTF), employs two subtly different models trained with two different loss functions to incline the execution burden of these attacks. The evaluation criteria for this defense method include measuring the success rate and the average number of queries required to generate adversarial examples using two of the most potent attack methods
from recent studies and comparing its defense performance against a leading defense strategy in the literature, i.e., Random Noise Defense (RND) Method, demonstrating our method’s efficacy in enhancing model security against black-box adversarial attacks.