Artificial Intelligence
S. H. Zahiri; R. Iranpoor; N. Mehrshad
Abstract
Background and Objectives: Person re-identification is an important application in computer vision, enabling the recognition of individuals across non-overlapping camera views. However, the large number of pedestrians with varying appearances, poses, and environmental conditions makes this task particularly ...
Read More
Background and Objectives: Person re-identification is an important application in computer vision, enabling the recognition of individuals across non-overlapping camera views. However, the large number of pedestrians with varying appearances, poses, and environmental conditions makes this task particularly challenging. To address these challenges, various learning approaches have been employed. Achieving a balance between speed and accuracy is a key focus of this research. Recently introduced transformer-based models have made significant strides in machine vision, though they have limitations in terms of time and input data. This research aims to balance these models by reducing the input information, focusing attention solely on features extracted from a convolutional neural network model. Methods: This research integrates convolutional neural network (CNN) and Transformer architectures. A CNN extracts important features of a person in an image, and these features are then processed by the attention mechanism in a Transformer model. The primary objective of this work is to enhance computational speed and accuracy in Transformer architectures. Results: The results obtained demonstrate an improvement in the performance of the architectures under consistent conditions. In summary, for the Market-1501 dataset, the mAP metric increased from approximately 30% in the downsized Transformer model to around 74% after applying the desired modifications. Similarly, the Rank-1 metric improved from 48% to approximately 89%.Conclusion: Indeed, although it still has limitations compared to larger Transformer models, the downsized Transformer architecture has proven to be much more computationally efficient. Applying similar modifications to larger models could also yield positive effects. Balancing computational costs while improving detection accuracy remains a relative goal, dependent on specific domains and priorities. Choosing the appropriate method may emphasize one aspect over another.
Deep Learning
Z. Raisi; V. M. Nazarzehi Had; E. Sarani; R. Damani
Abstract
Background and Objectives: Research on right-to-left scripts, particularly Persian text recognition in wild images, is limited due to lacking a comprehensive benchmark dataset. Applying state-of-the-art (SOTA) techniques on existing Latin or multilingual datasets often results in poor recognition performance ...
Read More
Background and Objectives: Research on right-to-left scripts, particularly Persian text recognition in wild images, is limited due to lacking a comprehensive benchmark dataset. Applying state-of-the-art (SOTA) techniques on existing Latin or multilingual datasets often results in poor recognition performance for Persian scripts. This study aims to bridge this gap by introducing a comprehensive dataset for Persian text recognition and evaluating SOTA models on it.Methods: We propose a Farsi (Persian) text recognition (FATR) dataset, which includes challenging images captured in various indoor and outdoor environments. Additionally, we introduce FATR-Synth, the largest synthetic Persian text dataset, containing over 200,000 cropped word images designed for pre-training scene text recognition models. We evaluate five SOTA deep learning-based scene text recognition models using standard word recognition accuracy (WRA) metrics on the proposed datasets. We compare the performance of these recent architectures qualitatively on challenging sample images of the FATR dataset.Results: Our experiments demonstrate that SOTA recognition models' performance declines significantly when tested on the FATR dataset. However, when trained on synthetic and real-world Persian text datasets, these models demonstrate improved performance on Persian scripts.Conclusion: Introducing the FATR dataset enhances the resources available for Persian text recognition, improving model performance. The proposed datasets, trained models, and code is available at https://github.com/zobeirraisi/FATDR.
Natural Language Processing
M. Khazeni; M. Heydari; A. Albadvi
Abstract
Background and Objectives: The lack of a suitable tool for the analysis of conversational texts in Persian language has made various analyzes of these texts, including Sentiment Analysis, difficult. In this research, it has we tried to make the understanding of these texts easier for the machine by providing ...
Read More
Background and Objectives: The lack of a suitable tool for the analysis of conversational texts in Persian language has made various analyzes of these texts, including Sentiment Analysis, difficult. In this research, it has we tried to make the understanding of these texts easier for the machine by providing PSC, Persian Slang Convertor, a tool for converting conversational texts into formal ones, and by using the most up-to-date and best deep learning methods along with the PSC, the sentiment learning of short Persian language texts for the machine in a better way.Methods: Be made More than 10 million unlabeled texts from various social networks and movie subtitles (as dialogue texts) and about 10 million news texts (as official texts) have been used for training unsupervised models and formal implementation of the tool. 60,000 texts from the comments of Instagram social network users with positive, negative, and neutral labels are considered as supervised data for training the emotion classification model of short texts. The latest methods such as LSTM, CNN, BERT, ELMo, and deep processing techniques such as learning rate decay, regularization, and dropout have been used. LSTM has been utilized in the research, and the best accuracy has been achieved using this method.Results: Using the official tool, 57% of the words of the corpus of conversation were converted. Finally, by using the formalizer, FastText model and deep LSTM network, the accuracy of 81.91 was obtained on the test data.Conclusion: In this research, an attempt was made to pre-train models using unlabeled data, and in some cases, existing pre-trained models such as ParsBERT were used. Then, a model was implemented to classify the Sentiment of Persian short texts using labeled data.
Computer Vision
R. Iranpoor; S. H. Zahiri
Abstract
Background and Objectives: Re-identifying individuals due to its capability to match a person across non-overlapping cameras is a significant application in computer vision. However, it presents a challenging task because of the large number of pedestrians with various poses and appearances appearing ...
Read More
Background and Objectives: Re-identifying individuals due to its capability to match a person across non-overlapping cameras is a significant application in computer vision. However, it presents a challenging task because of the large number of pedestrians with various poses and appearances appearing at different camera viewpoints. Consequently, various learning approaches have been employed to overcome these challenges. The use of methods that can strike an appropriate balance between speed and accuracy is also a key consideration in this research.Methods: Since one of the key challenges is reducing computational costs, the initial focus is on evaluating various methods. Subsequently, improvements to these methods have been made by adding components to networks that have low computational costs. The most significant of these modifications is the addition of an Image Re-Retrieval Layer (IRL) to the Backbone network to investigate changes in accuracy. Results: Given that increasing computational speed is a fundamental goal of this work, the use of MobileNetV2 architecture as the Backbone network has been considered. The IRL block has been designed for minimal impact on computational speed. By examining this component, specifically for the CUHK03 dataset, there was a 5% increase in mAP and a 3% increase in @Rank1. For the Market-1501 dataset, the improvement is partially evident. Comparisons with more complex architectures have shown a significant increase in computational speed in these methods.Conclusion: Reducing computational costs while increasing relative recognition accuracy are interdependent objectives. Depending on the specific context and priorities, one might emphasize one over the other when selecting an appropriate method. The changes applied in this research can lead to more optimal results in method selection, striking a balance between computational efficiency and recognition accuracy.
Artificial Intelligence
S. Nemati
Abstract
Background and Objectives: Community question-answering (CQA) websites have become increasingly popular as platforms for individuals to seek and share knowledge. Identifying users with a special shape of expertise on CQA websites is a beneficial task for both companies and individuals. Specifically, ...
Read More
Background and Objectives: Community question-answering (CQA) websites have become increasingly popular as platforms for individuals to seek and share knowledge. Identifying users with a special shape of expertise on CQA websites is a beneficial task for both companies and individuals. Specifically, finding those who have a general understanding of certain areas but lack expertise in other fields is crucial for companies who are planning internship programs. These users, called dash-shaped users, are willing to work for low wages and have the potential to quickly develop into skilled professionals, thus minimizing the risk of unsuccessful recruitment. Due to the vast number of users on CQA websites, they provide valuable resources for finding individuals with various levels of expertise. This study is the first of its kind to directly classify CQA users based solely on the textual content of their posts. Methods: To achieve this objective, we propose an ensemble of advanced deep learning algorithms and traditional machine learning methods for the binary classification of CQA users into two categories: those with dash-shaped expertise and those without. In the proposed method, we used the stack generalization to fuse the results of the dep and machine learning methods. To evaluate the effectiveness of our approach, we conducted an extensive experiment on three large datasets focused on Android, C#, and Java topics extracted from the Stack Overflow website. Results: The results on four datasets of the Stack Overflow, demonstrate that our ensemble method not only outperforms baseline methods including seven traditional machine learning and six deep models, but it achieves higher performance than state-of-the-art deep models by an average of 10% accuracy and F1-measure. Conclusion: The proposed model showed promising results in confirming that by using only their textual content of questions, we can classify the users in CQA websites. Specifically, the results showed that using the contextual content of the questions, the proposed model can be used for detecting the dash-shaped users precisely. Moreover, the proposed model is not limited to detecting dash-shaped users. It can also classify other shapes of expertise, such as T- and C-shaped users, which are valuable for forming agile software teams. Additionally, our model can be used as a filter method for downstream applications, like intern recommendations.
Image Annotation and Retrieval
A. Gheitasi; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: Freehand sketching is an easy-to-use but effective instrument for computer-human connection. Sketches are highly abstract to the domain gap, that exists between the intended sketch and real image. In addition to appearance information, it is believed that shape information ...
Read More
Background and Objectives: Freehand sketching is an easy-to-use but effective instrument for computer-human connection. Sketches are highly abstract to the domain gap, that exists between the intended sketch and real image. In addition to appearance information, it is believed that shape information is also very efficient in sketch recognition and retrieval. Methods: In the realm of machine vision, comprehending Freehand Sketches has grown more crucial due to the widespread use of touchscreen devices. In addition to appearance information, it is believed that shape information is also very efficient in sketch recognition and retrieval. The majority of sketch recognition and retrieval methods utilize appearance information-based tactics. A hybrid network architecture comprising two networks—S-Net (Sketch Network) and A-Net (Appearance Network)—is shown in this article under the heading of hybrid convolution. These subnetworks, in turn, describe appearance and shape information. Conversely, a module known as the Conventional Correlation Analysis (CCA) technique module is utilized to match the range and enhance the sketch retrieval performance to decrease the range gap distance. Finally, sketch retrieval using the hybrid Convolutional Neural Network (CNN) and CCA domain adaptation module is tested using many datasets, including Sketchy, Tu-Berlin, and Flickr-15k. The final experimental results demonstrated that compared to more sophisticated methods, the hybrid CNN and CCA module produced high accuracy and results.Results: The proposed method has been evaluated in the two fields of image classification and Sketch Based Image Retrieval (SBIR). The proposed hybrid convolution works better than other basic networks. It achieves a classification score of 84.44% for the TU-Berlin dataset and 82.76% for the sketchy dataset. Additionally, in SBIR, the proposed method stands out among methods based on deep learning, outperforming non-deep methods by a significant margin. Conclusion: This research presented the hybrid convolutional framework, which is based on deep learning for pattern recognition. Compared to the best available methods, hybrid network convolution has increased recognition and retrieval accuracy by around 5%. It is an efficient and thorough method which demonstrated valid results in Sketch-based image classification and retrieval on TU-Berlin, Flickr 15k, and sketchy datasets.
Computer Vision
Z. Raisi; J. Zelek
Abstract
Background and Objectives: Signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, ...
Read More
Background and Objectives: Signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, viewpoint changes, and occlusion factors.Methods: This paper proposes an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. The proposed model leverages a pre-trained Vision Transformer based (ViT) architecture combined with a multi-task transformer-based text detector more suitable for the VPR task. Our central contribution is introducing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. We first equip the ViT backbone using a masked autoencoder (MAE) to capture partially occluded characters to address the occlusion problem. Then, we use a multi-task prediction head for the proposed model to handle arbitrary shapes of text instances with polygon bounding boxes.Results: The evaluation of the proposed architecture's performance for VPR involved conducting several experiments on the challenging Self-Collected Text Place (SCTP) benchmark dataset. The well-known evaluation metric, Precision-Recall, was employed to measure the performance of the proposed pipeline. The final model achieved the following performances, Recall = 0.93 and Precision = 0.8, upon testing on this benchmark.Conclusion: The initial experimental results show that the proposed model outperforms the state-of-the-art (SOTA) methods in comparison to the SCTP dataset, which confirms the robustness of the proposed end-to-end scene text detection and recognition model.
Deep Learning
M. Taherparvar; F. Ahmadi Abkenari; P. Bayat
Abstract
Background and Objectives: Embedding social networks has attracted researchers’ attention so far. The aim of network embedding is to learn a low-dimensional representation of each network vertex while maintaining the structure and characteristics of the network. Most of these existing network embedding ...
Read More
Background and Objectives: Embedding social networks has attracted researchers’ attention so far. The aim of network embedding is to learn a low-dimensional representation of each network vertex while maintaining the structure and characteristics of the network. Most of these existing network embedding methods focus on only preserving the structure of networks, but they mostly ignore the semantic and centrality-based information. Moreover, the vertices selection has been done blindly (greedy) in the existing methods.Methods: In this paper, a comprehensive algorithm entitled CSRW stands for centrality, and a semantic-based random walk is proposed for the network embedding process based on the main criteria of the centrality concept as well as the semantic impact of the textual information of each vertex and considering the impact of neighboring nodes. in CSRW, textual analysis based on the BTM topic modelling approach is investigated and the final display is performed using the Skip-Gram model in the network.Results: The conducted experiments have shown the robustness of the proposed method of this paper in comparison to other existing classical approaches such as DeepWalk, CARE, CONE, COANE, and DCB in terms of vertex classification, and link prediction. And in the criterion of link prediction in a Subgraph with 5000 members, an accuracy of 0.91 has been reached for the criterion of closeness centrality and is better than other methods.Conclusion: The CSRW algorithm is scalable and has achieved higher accuracy on larger datasets.
Computer Vision
S. H. Safavi; M. Sadeghi; M. Ebadpour
Abstract
Background and Objectives: Persian Road Surface Markings (PRSMs) recognition is a prerequisite for future intelligent vehicles in Iran. First, the existence of Persian texts on the Road Surface Markings (RSMs) makes it challenging. Second, the RSM could appear on the road with different qualities, such ...
Read More
Background and Objectives: Persian Road Surface Markings (PRSMs) recognition is a prerequisite for future intelligent vehicles in Iran. First, the existence of Persian texts on the Road Surface Markings (RSMs) makes it challenging. Second, the RSM could appear on the road with different qualities, such as poor, fair, and excellent quality. Since the type of poor-quality RSM is variable from one province to another (i.e., varying road structure and scene complexity), it is a very essential and challenging task to recognize unforeseen poor-quality RSMs. Third, almost all existed datasets have imbalanced classes that affect the accuracy of the recognition problem. Methods: To address the first challenge, the proposed Persian Road Surface Recognizer (PRSR) approach hierarchically separates the texts and symbols before recognition. To this end, the Symbol Text Separator Network (STS-Net) is proposed. Consequently, the proposed Text Recognizer Network (TR-Net) and Symbol Recognizer Network (SR-Net) respectively recognize the text and symbol. To investigate the second challenge, we introduce two different scenario. Scenario A: Conventional random splitting training and testing data. Scenario B: Since the PRSM dataset include few images of different distance from each scene of RSM, it is highly probable that at least one of these images appear in the training set, making the recognition process easy. Since in any province of Iran, we may see a new type of poor quality RSM, which is unforeseen before (in training set), we design a realistic and challengeable scenario B in which the network is trained using excellent and fair quality RSMs and tested on poor quality ones. Besides, we propose to use the data augmentation technique to overcome the class imbalanced data challenge.Results: The proposed approach achieves reliable performance (precision of 73.37% for scenario B) on the PRSM dataset . It significantly improves the recognition accuracy up to 15% in different scenarios.Conclusion: Since the PRSMs include both Persian texts (with different styles) and symbols, prior to recognition process, separating the text and symbol by a proposed STS-Net could increase the recognition rate. Deploying new powerful networks and investigating new techniques to deal with class imbalanced data in the recognition problem of the PRSM dataset as well as data augmentation would be an interesting future work.
Artificial Intelligence
S.M. Notghimoghadam; H. Farsi; S. Mohamadzadeh
Abstract
Background and Objectives: Object detection has been a fundamental issue in computer vision. Research findings indicate that object detection aided by convolutional neural networks (CNNs) is still in its infancy despite -having outpaced other methods. Methods: This study proposes a straightforward, ...
Read More
Background and Objectives: Object detection has been a fundamental issue in computer vision. Research findings indicate that object detection aided by convolutional neural networks (CNNs) is still in its infancy despite -having outpaced other methods. Methods: This study proposes a straightforward, easily implementable, and high-precision object detection method that can detect objects with minimum least error. Object detectors generally fall into one-stage and two-stage detectors. Unlike one-stage detectors, two-stage detectors are often more precise, despite performing at a lower speed. In this study, a one-stage detector is proposed, and the results indicated its sufficient precision. The proposed method uses a feature pyramid network (FPN) to detect objects on multiple scales. This network is combined with the ResNet 50 deep neural network. Results: The proposed method is trained and tested on Pascal VOC 2007 and COCO datasets. It yields a mean average precision (mAP) of 41.91 in Pascal Voc2007 and 60.07% in MS COCO. The proposed method is tested under additive noise. The test images of the datasets are combined with the salt and pepper noise to obtain the value of mAP for different noise levels up to 50% for Pascal VOC and MS COCO datasets. The investigations show that the proposed method provides acceptable results. Conclusion: It can be concluded that using deep learning algorithms and CNNs and combining them with a feature network can significantly enhance object detection precision.
Machine Learning
H. Nunoo-Mensah; S. Wewoliamo Kuseh; J. Yankey; F. A. Acheampong
Abstract
Background and Objectives: To a large extent, low production of maize can be attributed to diseases and pests. Accurate, fast, and early detection of maize plant disease is critical for efficient maize production. Early detection of a disease enables growers, breeders and researchers to effectively apply ...
Read More
Background and Objectives: To a large extent, low production of maize can be attributed to diseases and pests. Accurate, fast, and early detection of maize plant disease is critical for efficient maize production. Early detection of a disease enables growers, breeders and researchers to effectively apply the appropriate controlled measures to mitigate the disease’s effects. Unfortunately, the lack of expertise in this area and the cost involved often result in an incorrect diagnosis of maize plant diseases which can cause significant economic loss. Over the years, there have been many techniques that have been developed for the detection of plant diseases. In recent years, computer-aided methods, especially Machine learning (ML) techniques combined with crop images (image-based phenotyping), have become dominant for plant disease detection. Deep learning techniques (DL) have demonstrated high accuracies of performing complex cognitive tasks like humans among machine learning approaches. This paper aims at presenting a comprehensive review of state-of-the-art DL techniques used for detecting disease in the leaves of maize.Methods: In achieving the aims of this paper, we divided the methodology into two main sections; Article Selection and Detailed review of selected articles. An algorithm was used in selecting the state-of-the-art DL techniques for maize disease detection spanning from 2016 to 2021. Each selected article is then reviewed in detail taking into considerations the DL technique, dataset used, strengths and limitations of each technique. Results: DL techniques have demonstrated high accuracies in maize disease detection. It was revealed that transfer learning reduces training time and improves the accuracies of models. Models trained with images taking from a controlled environment (single leaves) perform poorly when deployed in the field where there are several leaves. Two-stage object detection models show superior performance when deployed in the field. Conclusion: From the results, lack of experts to annotate accurately, Model architecture, hyperparameter tuning, and training resources are some of the challenges facing maize leaf disease detection. DL techniques based on two-stage object detection algorithms are best suited for several plant leaves and complex backgrounds images.
Computer Vision
M. Taheri; M. Rastgarpour; A. Koochari
Abstract
Background and Objectives: medical image Segmentation is a challenging task due to low contrast between Region of Interest and other textures, hair artifacts in dermoscopic medical images, illumination variations in images like Chest-Xray and various imaging acquisition conditions.Methods: In ...
Read More
Background and Objectives: medical image Segmentation is a challenging task due to low contrast between Region of Interest and other textures, hair artifacts in dermoscopic medical images, illumination variations in images like Chest-Xray and various imaging acquisition conditions.Methods: In this paper, we have utilized a novel method based on Convolutional Neural Networks (CNN) for medical image Segmentation and finally, compared our results with two famous architectures, include U-net and FCN neural networks. For loss functions, we have utilized both Jaccard distance and Binary-crossentropy and the optimization algorithm that has used in this method is SGD+Nestrov algorithm. In this method, we have used two preprocessing include resizing image’s dimensions for increasing the speed of our process and Image augmentation for improving the results of our network. Finally, we have implemented threshold technique as postprocessing on the outputs of neural network to improve the contrast of images. We have implemented our model on the famous publicly, PH2 Database, toward Melanoma lesion segmentation and chest Xray images because as we have mentioned, these two types of medical images contain hair artifacts and illumination variations and we are going to show the robustness of our method for segmenting these images and compare it with the other methods.Results: Experimental results showed that this method could outperformed two other famous architectures, include Unet and FCN convolutional neural networks. Additionally, we could improve the performance metrics that have used in dermoscopic and Chest-Xray segmentation which used before.Conclusion: In this work, we have proposed an encoder-decoder framework based on deep convolutional neural networks for medical image segmentation on dermoscopic and Chest-Xray medical images. Two techniques of image augmentation, image rotation and horizontal flipping on the training dataset are performed before feeding it to the network for training. The predictions produced from the model on test images were postprocessed using the threshold technique to remove the blurry boundaries around the predicted lesions.