Document Type : Original Research Paper
Authors
Department of Biomedical Engineering, Ma.C., Islamic Azad University, Mashhad, Iran.
Abstract
Background and Objectives: While deep learning has significantly advanced visual content recognition, existing models primarily rely on image data alone, neglecting the rich cognitive context embedded in neural responses. This study aimed to develop and validate a novel framework that synergistically integrates electroencephalography (EEG) signals with visual features to achieve superior accuracy in multiclass image recognition.
Methods: We designed a hierarchical attention-based deep learning architecture to fuse neural and visual information. EEG data recorded (the dataset newly developed by the authors) during visual stimulus presentation were preprocessed and analyzed using temporal models (RNN-CNN and LSTM) to extract neural features. Concurrently, visual features were extracted from the stimulus images using ResNet101 and DenseNet201 architectures. The proposed attention mechanism dynamically weighted and integrated these multimodal features, prioritizing the most salient information from each modality.
Results: The proposed framework significantly outperformed conventional unimodal approaches. The hybrid RNN-CNN + ResNet101 model achieved a peak classification accuracy. A feature contribution analysis revealed that the optimal performance was attained through an integrated contribution of approximately 60% from image-derived features and 40% from EEG-derived features, demonstrating the critical complementary value of neural data.
Conclusion: This study confirms that the structured, attention-based fusion of neurophysiological and visual data substantially enhances visual content recognition. The findings provide a robust and effective framework for advanced cognitive assessment applications and establish a new benchmark for multimodal integration in machine learning, highlighting the significant potential of EEG data to complement and improve computer vision tasks.
Keywords
- EEG–image Fusion
- Attention-based Deep Learning
- Multi-class Visual Content Classification
- Hierarchical Attention Mechanism
- RNN-CNN
Main Subjects
Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/
Publisher’s Note
JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Publisher
Shahid Rajaee Teacher Training University
Send comment about this article