A New Bit-Wise Approach for Image-in-Audio Steganography Using Deep Learning

Raeisian Dashtaki, Ebrahim; Ghaemmaghami, Shahrokh

doi:10.22042/isecure.2025.214367

A New Bit-Wise Approach for Image-in-Audio Steganography Using Deep Learning

Document Type : Research Article

Authors

Ebrahim Raeisian Dashtaki ¹

Shahrokh Ghaemmaghami ²

¹ Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran

² Electronics Research Institute, Sharif University of Technology, Tehran, Iran

https://doi.org/10.22042/isecure.2025.214367

Abstract

This work proposes a novel steganographic scheme that employs deep learning to embed RGB images into audio files, and it also introduces an innovative steganalysis approach. The proposed method embeds an image at the bit level within the audio in the frequency domain, enhancing flexibility for embedding various data types. The network uses an encoder-decoder architecture, where the encoder embeds bits into the audio, and the decoder extracts the embedded bits from the audio. To enhance the information transmission rate, an image compression method based on the YUV color model is used. This method can reduce the data to be hidden and transmitted for the image by up to 50%. The steganographic encoder-decoder architecture incorporates multiple paths to facilitate gradient flow and network training. The proposed steganalysis network effectively detects stego audio files containing hidden messages by analyzing the signal’s transform domain features. The results demonstrate the proposed steganography scheme’s enhanced security, with audio Signal-to-Noise Ratio (SNR) ranging from 26.9 to 39.6 dB and image Peak Signal-to-Noise Ratio (PSNR) from 19.02 to 34.8 dB. Compared to other audio steganography schemes, the proposed method is shown to have a higher performance in terms of audio cover perceptibility and hidden image quality.

Keywords

Audio Steganography

Information Hiding

Signal Processing

Steganalysis Techniques

[1] Sachin Dhawan and Rashmi Gupta. Analysis of various data security techniques of steganography: A survey. Information Security Journal: A Global Perspective, 30(2):63–87, 2021.
[2] Pratap Chandra Mandal, Imon Mukherjee, Goutam Paul, and B.N. Chatterji. Digital image steganography: A literature survey. Information Sciences, 609:1451–1488, 2022.
[3] Hamza Kheddar, Mustapha Hemis, Yassine Himeur, David Meg´ıas, and Abbes Amira. Deep learning for steganalysis of diverse data types: A review of methods, taxonomy, challenges and future directions. Neurocomputing, 581:127528, 2024.
[4] Ahmed A. AlSabhany, Ahmed Hussain Ali, Farida Ridzuan, A.H. Azni, and Mohd Rosmadi Mokhtar. Digital audio steganography: Systematic review, classification, and analysis of the current state of the art. Computer Science Review, 38:100316, 2020.
[5] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1–9, 2015.
[6] Zaher Hamid Al-Tairi, Rahmita Wirza Rahmat, M Iqbal Saripan, and Puteri Suhaiza Sulaiman. Skin segmentation using yuv and rgb color spaces. Journal of information processing systems, 10(2):283–299, 2014.
[7] Hamzeh Ghasemzadeh and Meisam Khalil Arjmandi. Universal audio steganalysis based on calibration and reversed frequency resolution of human auditory system. IET Signal Processing, 11(8):916–922, 2017.

[8] Yuzhen Lin, Rangding Wang, Diqun Yan, Li Dong, and Xueyuan Zhang. Audio steganalysis with improved convolutional neural network. In Proceedings of the ACM workshop on information hiding and multimedia security, pages 210–215, 2019.
[9] Zhong-Liang Yang, Xiao-Qing Guo, Zi-Ming Chen, Yong-Feng Huang, and Yu-Jin Zhang. Rnn-stega: Linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security, 14(5):1280–1295, 2018.
[10] Lang Chen, Rangding Wang, Li Dong, and Diqun Yan. Imperceptible adversarial audio steganography based on psychoacoustic model. Multimedia Tools and Applications, 82(17):26451–26463, 2023.
[11] Yanzhen Ren, Dengkai Liu, Chenyu Liu, Qiaochu Xiong, Jianming Fu, and Lina Wang. A universal audio steganalysis scheme based on multiscale spectrograms and deepresnet. IEEE Transactions on Dependable and Secure Computing, 20(1):665–679, 2022.
[12] Wenxue Cui, Shaohui Liu, Feng Jiang, Yongliang Liu, and Debin Zhao. Multi-stage residual hiding for image-into-audio steganography. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 2832–2836. IEEE, 2020.
[13] Ru Zhang, Hao Dong, Zhen Yang, Wenbo Ying, and Jianyi Liu. A cnn based visual audio steganography model. In International Conference on Adaptive and Intelligent Systems, pages 431–442. Springer, 2022.
[14] Shivam Agarwal and Siddarth Venkatraman. Deep residual neural networks for image in speech steganography. arXiv preprint arXiv:2003.13217, 2020.
[15] Subhajit Paul and Deepak Mishra. Hiding images within audio using deep generative model. Multimedia Tools and Applications, 82(4):5049–5072, 2023.
[16] Margarita Geleta, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, and Xavier Giro-i Nieto. Pixinwav: Residual steganography for hiding pixels in audio. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2485–2489. IEEE, 2022.
[17] Sercan ¨O Arık, Heewoo Jun, and Gregory Diamos. Fast spectrogram inversion using multihead convolutional neural networks. IEEE Signal Processing Letters, 26(1):94–98, 2018.
[18] Marco Cuturi and Mathieu Blondel. Soft-dtw: a differentiable loss function for time-series. In International conference on machine learning, pages 894–903. PMLR, 2017.
[19] Keith Ito and Linda Johnson. The lj speech dataset. https://keithito.com/LJ-Speech-Dataset/, 2017.
[20] John Garofolo, Lori Lamel, William Fisher, Jonathan Fiscus, David Pallett, Nancy Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.
[21] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012(VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
[22] Anish Shah, Eashan Kadam, Hena Shah, Sameer Shinde, and Sandip Shingade. Deep residual networks with exponential linear unit. In Proceedings of the third international symposium on computer vision and the internet, pages 59–65, 2016.
[23] Chunling Han, Rui Xue, Rui Zhang, and Xueqing Wang. A new audio steganalysis method based on linear prediction. Multimedia Tools and Applications, 77:15431–15455, 2018.