Image Processing
Morteza Akbari; Seyyed Mohammad Razavi; Sajad Mohamadzadeh
Abstract
Background and Objectives: Multi-object tracking in dense, multi-camera environments remains challenging due to occlusions, lighting variations, and fragmented trajectories. While existing methods rely on hierarchical two-step approaches or complex Bayesian filters, they often fail to fully exploit spatio-temporal ...
Read More
Background and Objectives: Multi-object tracking in dense, multi-camera environments remains challenging due to occlusions, lighting variations, and fragmented trajectories. While existing methods rely on hierarchical two-step approaches or complex Bayesian filters, they often fail to fully exploit spatio-temporal correlations or to approach global consistency across cameras and frames. This study aims to address these limitations by proposing a novel graph-based deep learning model for continuous person tracking that independently optimizes spatial and temporal associations.Methods: The proposed model decomposes multi-camera tracking into two tasks: temporal association (linking objects across frames using velocity and time) and spatial association (aligning objects from multiple viewpoints). A spatio-temporal graph structure is constructed, with nodes representing detected objects and edges encoding relationships. Message Passing Networks (MPNs) iteratively update node and edge features, while a graph consensus fusion module merges spatial and temporal graphs for robust tracking. The model is trained using Focal Loss and evaluated on the Wildtrack and CAMPUS datasets.Results: The model achieves state-of-the-art performance, with a MOTA score of 85.5% on Wildtrack and 77.4–87.4% on CAMPUS subsets. Key improvements include a 100% MT (mostly tracked) rate and 0% ML (mostly lost) rate on CAMPUS, demonstrating exceptional robustness in occluded and crowded scenes. The IDF1 score (87.2%) highlights superior identity preservation. The decoupled design reduces graph size, which improves scalability.Conclusion: By decoupling spatial and temporal associations and leveraging graph-based optimization, the proposed model significantly enhances tracking accuracy and reliability in multi-camera settings. This work provides a framework for applications like surveillance and autonomous systems, with future potential for attention mechanisms and adaptive graph integration.
Image Annotation and Retrieval
Sajad Mohamadzadeh; Mohammad Gharehbagh
Abstract
Background and Objectives: Content-Based Image Retrieval (CBIR) systems are crucial for managing the exponential growth of digital imagery. Traditional methods relying on handcrafted features often fail to scale and capture semantic content. Although deep learning enhances retrieval quality, challenges ...
Read More
Background and Objectives: Content-Based Image Retrieval (CBIR) systems are crucial for managing the exponential growth of digital imagery. Traditional methods relying on handcrafted features often fail to scale and capture semantic content. Although deep learning enhances retrieval quality, challenges persist in computational complexity and efficiency. This paper introduces a hybrid CBIR framework that combines unsupervised deep feature learning, adaptive hashing, and VP-Tree-based hierarchical search optimization. The proposed system, evaluated on CIFAR-10, ImageNet subset, and a custom medical imaging dataset, achieves a mean average precision (mAP) of 96.1% and reduces retrieval latency by approximately 40% compared to conventional methods. By leveraging autoencoder-driven latent feature extraction and scalable metric space partitioning, our framework demonstrates superior performance in scalability, retrieval speed, and accuracy for large-scale applications.Methods: The proposed framework employs autoencoder-driven latent space encoding to extract compact yet semantically rich feature representations, ensuring robust discriminability across diverse image categories. To enhance retrieval efficiency, a hybrid search mechanism is implemented: a Euclidean-based nearest neighbor scheme O(N log N) is used for moderate-scale datasets, while a VP-Tree-based hashing scheme O(log N) is applied for large-scale retrieval scenarios. By leveraging hierarchical metric space partitioning, the method significantly reduces search complexity while maintaining retrieval accuracy.Results: Extensive evaluations show the proposed framework outperforms traditional and modern deep hashing techniques, achieving higher mean average precision, lower search latency, and better storage efficiency for both moderate and large-scale datasets. By integrating unsupervised representation learning, advanced hashing, and optimized search structures, the system surpasses conventional methods in speed and precision.Conclusion: This study presents a highly scalable and computationally efficient CBIR framework that addresses the limitations of existing methods by combining unsupervised deep feature learning, adaptive hashing, and hierarchical search structures. The results highlight the framework's ability to achieving high retrieval accuracy and efficiency, thus making it suitable for real-time applications in large-scale multimedia repositories.
Classification
M. Rohani; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: Recent advancements in race classification from facial images have been significantly propelled by deep learning techniques. Despite these advancements, many existing methodologies rely on intricate models that entail substantial computational costs and exhibit slow processing ...
Read More
Background and Objectives: Recent advancements in race classification from facial images have been significantly propelled by deep learning techniques. Despite these advancements, many existing methodologies rely on intricate models that entail substantial computational costs and exhibit slow processing speeds. This study aims to introduce an efficient and robust approach for race classification by utilizing transfer learning alongside a modified Efficient-Net model that incorporates attention-based learning.Methods: In this research, Efficient-Net is employed as the base model, applying transfer learning and attention mechanisms to enhance its efficacy in race classification tasks. The classifier component of Efficient-Net was strategically modified to minimize the parameter count, thereby enhancing processing speed without compromising classification accuracy. To address dataset imbalance, we implemented extensive data augmentation and random oversampling techniques. The modified model was rigorously trained and evaluated on a comprehensive dataset, with performance assessed through accuracy, precision, recall, and F1 score metrics.Results: The modified Efficient-Net model exhibited remarkable classification accuracy while significantly reducing computational demands on the UTK-Face and FairFace datasets. Specifically, the model achieved an accuracy of 88.19% on UTK-Face and 66% on FairFace, reflecting a 2% enhancement over the base model. Additionally, it demonstrated a 9-14% reduction in memory consumption and parameter count. Real-time evaluations revealed a processing speed 14% faster than the base model, alongside achieving the highest F1-score results, which underscores its effectiveness for practical applications. Furthermore, the proposed method enhanced test accuracy in classes with approximately 50% fewer training samples by about 5%.Conclusion: This study presents efficient race classification model grounded in a modified Efficient-Net that utilizes transfer learning and attention-based learning to attain state-of-the-art performance. The proposed approach not only sustains high accuracy but also ensures rapid processing speeds, rendering it ideal for real-time applications. The findings indicate that this lightweight model can effectively rival more complex and computationally intensive recent methods, providing a valuable asset for practical race classification endeavors.
Image Processing
S. Fooladi; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: The increasing prevalence of skin cancer highlights the urgency for early intervention, emphasizing the need for advanced diagnostic tools. Computer-assisted diagnosis (CAD) offers a promising avenue to streamline skin cancer screening and alleviate associated costs.Methods: ...
Read More
Background and Objectives: The increasing prevalence of skin cancer highlights the urgency for early intervention, emphasizing the need for advanced diagnostic tools. Computer-assisted diagnosis (CAD) offers a promising avenue to streamline skin cancer screening and alleviate associated costs.Methods: This study endeavors to develop an automatic segmentation system employing deep neural networks, seamlessly integrating data manipulation into the learning process. Utilizing an encoder-decoder architecture rooted in U-Net and augmented by wavelet transform, our methodology facilitates the generation of high-resolution feature maps, thus bolstering the precision of the deep learning model.Results: Performance evaluation metrics including sensitivity, accuracy, dice coefficient, and Jaccard similarity confirm the superior efficacy of our model compared to conventional methodologies. The results showed a accuracy of %96.89 for skin lesions in PH2 Database and %95.8 accuracy for ISIC 2017 database findings, which offers promising results compared to the results of other studies. Additionally, this research shows significant improvements in three metrics: sensitivity, Dice, and Jaccard. For the PH database, the values are 96, 96.40, and 95.40, respectively. For the ISIC database, the values are 92.85, 96.32, and 95.24, respectively.Conclusion: In image processing and analysis, numerous solutions have emerged to aid dermatologists in their diagnostic endeavors The proposed algorithm was evaluated using two PH datasets, and the results were compared to recent studies. Impressively, the proposed algorithm demonstrated superior performance in terms of accuracy, sensitivity, Dice coefficient, and Jaccard Similarity scores when evaluated on the same database images compared to other methods.
Object Recognition
E. Ghasemi Bideskan; S.M. Razavi; S. Mohamadzadeh; M. Taghippour
Abstract
Background and Objectives: The recognition of facial expressions using metaheuristic algorithms is a research topic in the field of computer vision. This article presents an approach to identify facial expressions using an optimized filter developed by metaheuristic algorithms. Methods: The entire process ...
Read More
Background and Objectives: The recognition of facial expressions using metaheuristic algorithms is a research topic in the field of computer vision. This article presents an approach to identify facial expressions using an optimized filter developed by metaheuristic algorithms. Methods: The entire process of feature extraction hinges on using a filter optimally configured by metaheuristic algorithms. Essentially, the purpose of utilizing this metaheuristic algorithm is to determine the optimal weights for feature extraction filters. Once the optimal weights for the filter have been determined by the metaheuristic algorithm, optimal filter sizes have also been determined. As an initial step, the k-nearest neighbor classifier is employed due to its simplicity and high accuracy. Following the initial stage, a final model is presented, which integrates results from both filterbank and Multilayer Perceptron neural networks.Results: An analysis of the existing instances in the FER2013 database has been conducted using the method proposed in this article. This model achieved a recognition rate of 78%, which is superior to other algorithms and methods while requiring less training time than other algorithms and methods.In addition, the JAFFE database, a Japanese women's database, was utilized for validation. On this dataset, the proposed approach achieved a 94.88% accuracy rate, outperforming other competitors.Conclusion: The purpose of this article is to propose a method for improving facial expression recognition by using an optimized filter, which is implemented through a metaheuristic algorithm based on the KA. In this approach, optimized filters were extracted using the metaheuristic algorithms kidney, k-nearest neighbor, and multilayer perceptron. Additionally, by employing this approach, the optimal size and number of filters for facial state recognition were determined in order to achieve the highest level of accuracy in the extraction process.
Image Annotation and Retrieval
A. Gheitasi; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: Freehand sketching is an easy-to-use but effective instrument for computer-human connection. Sketches are highly abstract to the domain gap, that exists between the intended sketch and real image. In addition to appearance information, it is believed that shape information ...
Read More
Background and Objectives: Freehand sketching is an easy-to-use but effective instrument for computer-human connection. Sketches are highly abstract to the domain gap, that exists between the intended sketch and real image. In addition to appearance information, it is believed that shape information is also very efficient in sketch recognition and retrieval. Methods: In the realm of machine vision, comprehending Freehand Sketches has grown more crucial due to the widespread use of touchscreen devices. In addition to appearance information, it is believed that shape information is also very efficient in sketch recognition and retrieval. The majority of sketch recognition and retrieval methods utilize appearance information-based tactics. A hybrid network architecture comprising two networks—S-Net (Sketch Network) and A-Net (Appearance Network)—is shown in this article under the heading of hybrid convolution. These subnetworks, in turn, describe appearance and shape information. Conversely, a module known as the Conventional Correlation Analysis (CCA) technique module is utilized to match the range and enhance the sketch retrieval performance to decrease the range gap distance. Finally, sketch retrieval using the hybrid Convolutional Neural Network (CNN) and CCA domain adaptation module is tested using many datasets, including Sketchy, Tu-Berlin, and Flickr-15k. The final experimental results demonstrated that compared to more sophisticated methods, the hybrid CNN and CCA module produced high accuracy and results.Results: The proposed method has been evaluated in the two fields of image classification and Sketch Based Image Retrieval (SBIR). The proposed hybrid convolution works better than other basic networks. It achieves a classification score of 84.44% for the TU-Berlin dataset and 82.76% for the sketchy dataset. Additionally, in SBIR, the proposed method stands out among methods based on deep learning, outperforming non-deep methods by a significant margin. Conclusion: This research presented the hybrid convolutional framework, which is based on deep learning for pattern recognition. Compared to the best available methods, hybrid network convolution has increased recognition and retrieval accuracy by around 5%. It is an efficient and thorough method which demonstrated valid results in Sketch-based image classification and retrieval on TU-Berlin, Flickr 15k, and sketchy datasets.
Artificial Intelligence
S.M. Notghimoghadam; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: Object detection has been a fundamental issue in computer vision. Research findings indicate that object detection aided by convolutional neural networks (CNNs) is still in its infancy despite -having outpaced other methods. Methods: This study proposes a straightforward, ...
Read More
Background and Objectives: Object detection has been a fundamental issue in computer vision. Research findings indicate that object detection aided by convolutional neural networks (CNNs) is still in its infancy despite -having outpaced other methods. Methods: This study proposes a straightforward, easily implementable, and high-precision object detection method that can detect objects with minimum least error. Object detectors generally fall into one-stage and two-stage detectors. Unlike one-stage detectors, two-stage detectors are often more precise, despite performing at a lower speed. In this study, a one-stage detector is proposed, and the results indicated its sufficient precision. The proposed method uses a feature pyramid network (FPN) to detect objects on multiple scales. This network is combined with the ResNet 50 deep neural network. Results: The proposed method is trained and tested on Pascal VOC 2007 and COCO datasets. It yields a mean average precision (mAP) of 41.91 in Pascal Voc2007 and 60.07% in MS COCO. The proposed method is tested under additive noise. The test images of the datasets are combined with the salt and pepper noise to obtain the value of mAP for different noise levels up to 50% for Pascal VOC and MS COCO datasets. The investigations show that the proposed method provides acceptable results. Conclusion: It can be concluded that using deep learning algorithms and CNNs and combining them with a feature network can significantly enhance object detection precision.
Video Processing
A. Akbari; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: Video processing is one of the essential concerns generally regarded over the last few years. Social group detection is one of the most necessary issues in crowd. For human-like robots, detecting groups and the relationship between members in groups are important. Moving in ...
Read More
Background and Objectives: Video processing is one of the essential concerns generally regarded over the last few years. Social group detection is one of the most necessary issues in crowd. For human-like robots, detecting groups and the relationship between members in groups are important. Moving in a group, consisting of two or more people, means moving the members of the group in the same direction and speed. Methods: Deep neural network (DNN) is applied for detecting social groups in the proposed method using the parameters including Euclidean distance, Proximity distance, Motion causality, Trajectory shape, and Heat-maps. First, features between pairs of all people in the video are extracted, and then the matrix of features is made. Next, the DNN learns social groups by the matrix of features.Results: The goal is to detect two or more individuals in social groups. The proposed method with DNN and extracted features detect social groups. Finally, the proposed method’s output is compared with different methods.Conclusion: In the latest years, the use of deep neural networks (DNNs) for learning and detecting has been increased. In this work, we used DNNs for detecting social groups with extracted features. The indexing consequences and the outputs of movies characterize the utility of DNNs with extracted features.