Deep Learning Attention-based Framework for Integrating EEG and Image Information in Visual Content Recognition

Hakkak, Hamed; Khalilzadeh, Mohammad Mahdi; Azarnoosh, Mahdi; Kobravi, Hamid Reza

doi:10.22061/jecei.2026.12557.890

Document Type : Original Research Paper

Authors

Department of Biomedical Engineering, Ma.C., Islamic Azad University, Mashhad, Iran.

https://doi.org/10.22061/jecei.2026.12557.890

Abstract

Background and Objectives: While deep learning has significantly advanced visual content recognition, existing models primarily rely on image data alone, neglecting the rich cognitive context embedded in neural responses. This study aimed to develop and validate a novel framework that synergistically integrates electroencephalography (EEG) signals with visual features to achieve superior accuracy in multiclass image recognition.
Methods: We designed a hierarchical attention-based deep learning architecture to fuse neural and visual information. EEG data recorded (the dataset newly developed by the authors) during visual stimulus presentation were preprocessed and analyzed using temporal models (RNN-CNN and LSTM) to extract neural features. Concurrently, visual features were extracted from the stimulus images using ResNet101 and DenseNet201 architectures. The proposed attention mechanism dynamically weighted and integrated these multimodal features, prioritizing the most salient information from each modality.
Results: The proposed framework significantly outperformed conventional unimodal approaches. The hybrid RNN-CNN + ResNet101 model achieved a peak classification accuracy. A feature contribution analysis revealed that the optimal performance was attained through an integrated contribution of approximately 60% from image-derived features and 40% from EEG-derived features, demonstrating the critical complementary value of neural data.
Conclusion: This study confirms that the structured, attention-based fusion of neurophysiological and visual data substantially enhances visual content recognition. The findings provide a robust and effective framework for advanced cognitive assessment applications and establish a new benchmark for multimodal integration in machine learning, highlighting the significant potential of EEG data to complement and improve computer vision tasks.

Keywords

Main Subjects

Multi-Source Signal Analysis

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Publisher

Shahid Rajaee Teacher Training University

References

[1] Y. Wang, X. Liu, C. Yu, "Assisted diagnosis of alzheimer’s disease based on deep learning and multimodal feature fusion," Complexity, 6626728, 2021.

[2] M. M. A. Monshi, J. Poon, V. Chung, "Deep learning in generating radiology reports: A survey," Artif. Intell. Med., 106, 101878, 2020.

[3] P. Lu, L. Hu, A. Mitelpunkt, S. Bhatnagar, L. Lu, H. Liang, "A hierarchical attention-based multimodal fusion framework for predicting the progression of Alzheimer's disease," Biomed. Signal Process. Control, 88(B), 105669, 2024.

[4] M. Liu, D. Guan, C. Zheng, C. Tian, J. Wen, Q. Zhu, "ViEEG: hierarchical neural coding with cross-modal progressive enhancement for EEG-based visual decoding," arXiv preprint arXiv:2505.12408, 2025.

[5] Z. Xue, R. Marculescu, "Dynamic multimodal fusion," in Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops: 2575-2584, 2023.

[6] Y. Wang, Y. Zhang, Y. Zhang, "A systematic review of intermediate fusion in multimodal deep learning for biomedical applications," Comput. Biol. Med., 166: 107497, 2025.

[7] M. Ozdemir, E. Akbas, "A hierarchical cross-modal spatial fusion network for multimodal emotion recognition," IEEE Trans. Affective Comput., 6(5): 1429-1438, 2025.

[8] S. Li, H. Tang, "Multimodal alignment and fusion: A survey," arXiv preprint arXiv: 2411.17040, 2024.

[9] M. Zuo, X. Chen, L. Sui, "Evaluation of machine learning algorithms for classification of visual stimulation-induced EEG signals in 2D and 3D VR videos," Brain Sci., 15(1), 75, 2025.

[10] R. Zhang, Q. Zong, L. Dou, X. Zhao, Y. Tang, Z. Li, "Hybrid deep neural network using transfer learning for EEG motor imagery decoding," Biomed. Signal Process. Control, 63, 102144, 2021.

[11] Z. C. Tang, C. Li, J. F. Wu, P. C. Liu, S. W. Cheng, "Classification of EEG-based single-trial motor imagery tasks using a B-CSP method for BCI," Front. Inf. Technol. Electron. Eng., 20(8): 1087-1098, 2019.

[12] M. Yu, A. Masrur, C. Blaszczak-Boxe, "Predicting hourly PM2. 5 concentrations in wildfire-prone areas using a SpatioTemporal Transformer model," Sci. Total Environ., 860, 160446, 2023.

[13] H. Ahmadieh, F. Gassemi, M.H. Moradi, "A hybrid deep learning framework for automated visual image classification using EEG signals," Neural Comput. Appl., 35(28): 20989-21005, 2023.

[14] X. Wu, Y. Chu, Q. Li, Y. Luo, Y. Zhao, X. Zhao, "AMEEGNet: attention-based multiscale EEGNet for effective motor imagery EEG decoding," Front. Neurorobotics, 19, 1540033, 2025.

[15] Z. Huang, Y. Yang, Y. Ma, Q. Dong, J. Su, H. Shi, S. Zhang, L. Hu, "EEG detection and recognition model for epilepsy based on dual attention mechanism," Sci. Rep., 15(1), 9404, 2025.

[16] W. Liao, Z. Miao, S. Liang, L. Zhang, C. Li, "A composite improved attention convolutional network for motor imagery EEG classification," Front. Neuroscience, 19, 1543508, 2025.

[17] K. Martín-Chinea, J. Ortega, J. F. Gómez-González, E. Pereda, J. Toledo, L. Acosta, "Effect of time windows in LSTM networks for EEG-based BCIs," Cognit. Neurodynamic, 17(2): 385-398, 2023.

[18] H. Li, M. Ding, R. Zhang, C. Xiu, "Motor imagery EEG classification algorithm based on CNN-LSTM feature fusion network," Biomed. Signal Process. Control, 72, 103342, 2022.

[19] R. Du, S. Zhu, H. Ni, T. Mao, J. Li, R. Wei, "Valence-arousal classification of emotion evoked by Chinese ancient-style music using 1D-CNN-BiLSTM model on EEG signals for college students," Multimedia Tools Appl., 82(10): 15439-15456, 2023.

[20] Z. Wang, J. Yang, M. Sawan, "A novel multi-scale dilated 3D CNN for epileptic seizure prediction," in Proc. 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS): 1-4, 2021.

[21] Y. Wang, L. Zhang, P. Xia, P. Wang, X. Chen, L. Du, Z. Fang, M. Du, "EEG-based emotion recognition using a 2D CNN with different kernels," Bioengineering, 9(6), 231, 2022.

[22] Z. Wang, Z. Ma, Z. An, F. Huang, "A novel diagnosis method of depression based on EEG and convolutional neural network," in Proc. International Conference on Frontier Computing: 91-102, 2021.

[23] S. Shanmugam, S. Dharmar, "A CNN-LSTM hybrid network for automatic seizure detection in EEG signals," Neural Comput. Appl., 35(28): 20605-20617, 2023.

[24] J. Wang, S. Cheng, J. Tian, Y. Gao, "A 2D CNN-LSTM hybrid algorithm using time series segments of EEG data for motor imagery classification," Biomed. Signal Process. Control, 83, 104627, 2023.

[25] J. Patel, "SeizureSight: 2D CNN-LSTM hybrid for EEG-based seizure prediction," in Proc. 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC): 252-256, 2024.

[26] X. Li, X. Xu, X. He, X. Wei, H. Yang, "Intelligent crack detection method based on GM-ResNet," Sensors, 23(20), 8369, 2023.

[27] A. J. Jalil, N. M. Reda, "Infrared thermal image gender classifier based on the deep resnet model," Adv. Human‐Comput. Interac., 2022(1), 3852054, 2022.

[28] Y. Hou, Z. Wu, X. Cai, T. Zhu, "The application of improved densenet algorithm in accurate image recognition," Sci. Rep. 14(1), 8645, 2024.

[29] M. G. Lanjewar, K. G. Panchbhai, P. Charanarur, "Lung cancer detection from CT scans using modified DenseNet with feature selection methods and ML classifiers," Exp. Syst. Appl., 224, 119961, 2023.

[30] S. Dash, P. K. Sethy, S. K. Behera, "Cervical transformation zone segmentation and classification based on improved Inception-ResNet-V2 using colposcopy images," Cancer Inf., 22, 2023.

[31] B. Hu, J. Liu, Y. Xu, T. Huo, "An integrated bearing fault diagnosis method based on multibranch SKNet and enhanced inception‐ResNet‐v2," Shock Vibr., 2024, 9071328, 2024.

[32] F. Khezerlou, A. Baradarani, M. A. Balafar, R. G. Maev, "Spatio‐temporal attention modules in orientation‐magnitude‐response guided multi‐stream CNNs for human action recognition," IET Image Process., 18(9): 2372-2388, 2024.

[33] C. Zeng, S. Feng, D. Zhu, Z. Wang, "Source acquisition device identification from recorded audio based on spatiotemporal representation learning with multi-attention mechanisms," Entropy, 25(4), 626, 2023.

[34] N. Delfan, M. Shahsavari, S. Hussain, R. Damaševičius, U. R. Acharya, "A hybrid deep spatiotemporal attention‐based model for parkinson's disease diagnosis using resting state EEG signals," Int. J. Imag. Syst. Technol., 34(4), e23120, 2024.

[35] Q. Xu, Y. Gao, J. Shen, Y. Li, X. Ran, H. Tang, G. Pan, "Enhancing adaptive history reserving by spiking convolutional block attention module in recurrent neural networks," Adv. Neural Inf. Process. Syst., 36: 58890-58901, 2023.

[36] X. Zhu, C. Liu, L. Zhao, S. Wang, "EEG emotion recognition network based on attention and spatiotemporal convolution," Sensors, 24(11), 3464, 2024.

[37] Y. Pan, Y. Shang, T. Liu, Z. Shao, G. Guo, H. Ding, Q. Hu, "Spatial–temporal attention network for depression recognition from facial videos," Exp. Syst. Appl., 237, 121410, 2024.

[38] C. Zhang, S. Wang, L. Zhong, Q. Chen, C. Rao, "Cascaded temporal and spatial attention network for solar adaptive optics image restoration," Astronom. Astrophys., 674, A126, 2023.

[39] A. Haeri-Mehrizi, S. Mohammadi, S. Rafifar, J. Sadighi, R. M. Kermani, R. Rostami, A. Hashemi, M. Tavousi, A. Montazeri, "Health literacy and mental health: a national cross-sectional inquiry," Sci. Rep., 14(1), 13639, 2024.

[40] A.K. Wojujutari, E. S. Idemudia, L. E. Ugwu, "The evaluation of the General Health Questionnaire (GHQ-12) reliability generalization: A meta-analysis," PloS one, 19(7), e0304182, 2024.

LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Journal of Electrical and Computer Engineering Innovations (JECEI)

Deep Learning Attention-based Framework for Integrating EEG and Image Information in Visual Content Recognition

References

References

Send comment about this article

Volume 14, Issue 2
July 2026
Pages 565-582

Deep Learning Attention-based Framework for Integrating EEG and Image Information in Visual Content Recognition

References

References

Send comment about this article

Volume 14, Issue 2July 2026Pages 565-582

Volume 14, Issue 2
July 2026
Pages 565-582