Artificial Intelligence
K. Moeenfar; V. Kiani; A. Soltani; R. Ravanifard
Abstract
Background and Objectives: In this paper, a novel and efficient unsupervised machine learning algorithm named EiForestASD is proposed for distinguishing anomalies from normal data in data streams. The proposed algorithm leverages a forest of isolation trees to detect anomaly data instances. Methods: ...
Read More
Background and Objectives: In this paper, a novel and efficient unsupervised machine learning algorithm named EiForestASD is proposed for distinguishing anomalies from normal data in data streams. The proposed algorithm leverages a forest of isolation trees to detect anomaly data instances. Methods: The proposed method EiForestASD incorporates an isolation forest as an adaptable detector model that adjusts to new data over time. To handle concept drifts in the data stream, a window-based concept drift detection is employed that discards only those isolation trees that are incompatible with the new concept. The proposed method is implemented using the Python programming language and the Scikit-Multiflow library.Results: Experimental evaluations were conducted on six real-world and two synthetic data streams. Results reveal that the proposed method EiForestASD reduces computation time by 19% and enhances anomaly detection rate by 9% compared to the baseline method iForestASD. These results highlight the efficacy and efficiency of the EiForestASD in the context of anomaly detection in data streams.Conclusion: The EiForestASD method handles concept change using an intelligent strategy where only those trees from the detector model incompatible with the new concept are removed and reconstructed. This modification of the concept drift handling mechanism in the EiForestASD significantly reduces computation time and improves anomaly detection accuracy.
Artificial Intelligence
S. Nemati
Abstract
Background and Objectives: Community question-answering (CQA) websites have become increasingly popular as platforms for individuals to seek and share knowledge. Identifying users with a special shape of expertise on CQA websites is a beneficial task for both companies and individuals. Specifically, ...
Read More
Background and Objectives: Community question-answering (CQA) websites have become increasingly popular as platforms for individuals to seek and share knowledge. Identifying users with a special shape of expertise on CQA websites is a beneficial task for both companies and individuals. Specifically, finding those who have a general understanding of certain areas but lack expertise in other fields is crucial for companies who are planning internship programs. These users, called dash-shaped users, are willing to work for low wages and have the potential to quickly develop into skilled professionals, thus minimizing the risk of unsuccessful recruitment. Due to the vast number of users on CQA websites, they provide valuable resources for finding individuals with various levels of expertise. This study is the first of its kind to directly classify CQA users based solely on the textual content of their posts. Methods: To achieve this objective, we propose an ensemble of advanced deep learning algorithms and traditional machine learning methods for the binary classification of CQA users into two categories: those with dash-shaped expertise and those without. In the proposed method, we used the stack generalization to fuse the results of the dep and machine learning methods. To evaluate the effectiveness of our approach, we conducted an extensive experiment on three large datasets focused on Android, C#, and Java topics extracted from the Stack Overflow website. Results: The results on four datasets of the Stack Overflow, demonstrate that our ensemble method not only outperforms baseline methods including seven traditional machine learning and six deep models, but it achieves higher performance than state-of-the-art deep models by an average of 10% accuracy and F1-measure. Conclusion: The proposed model showed promising results in confirming that by using only their textual content of questions, we can classify the users in CQA websites. Specifically, the results showed that using the contextual content of the questions, the proposed model can be used for detecting the dash-shaped users precisely. Moreover, the proposed model is not limited to detecting dash-shaped users. It can also classify other shapes of expertise, such as T- and C-shaped users, which are valuable for forming agile software teams. Additionally, our model can be used as a filter method for downstream applications, like intern recommendations.
Artificial Intelligence
H. Karim Tabbahfar; F. Tabib Mahmoudi
Abstract
Background and Objectives: Considering the drought and global warming, it is very important to monitor changes in water bodies for surface water management and preserve water resources in the natural ecosystem. For this purpose, using the appropriate spectral indices has high capabilities to distinguish ...
Read More
Background and Objectives: Considering the drought and global warming, it is very important to monitor changes in water bodies for surface water management and preserve water resources in the natural ecosystem. For this purpose, using the appropriate spectral indices has high capabilities to distinguish surface water bodies from other land covers. This research has a special consideration to the effect of different types of land covers around water bodies. For this reason, two different water bodies, lake and wetland, have been used to evaluate the implementation results.Methods: The main objective of this research is to evaluate the capabilities of the genetic algorithm in optimum selection of the spectral indices extracted from Sentinel-2 satellite image due to distinguish surface water bodies in two case studies: 1) the pure water behind the Karkheh dam and 2) the Shadegan wetland having water mixed with vegetation. In this regard, the set of optimal indices is obtained with the genetic algorithm followed by the support vector machine (SVM) classifier. Results: The evaluation of the classification results based on the optimum selected spectral indices showed that the overall accuracy and Kappa coefficient of the recognized surface water bodies are 98.18 and 0.9827 in the Karkheh dam and 98.04 and 0.93 in Shadegan wetland, respectively. Evaluation of each of the spectral indices measured in both study areas was carried out using quantitative decision tree (DT) classifier. The best obtained DT classification results show the improvements in overall accuracy by 1.42% in the Karkheh Dam area and 1.56% in the Shadegan Wetland area based on the optimum selected indices by genetic algorithm followed by SVM classifier. Moreover, the obtained classification results are superior compared with Random Forest classifier using the optimized set of spectral features.Conclusion: Applying the genetic algorithm on the spectral indices was able to obtain two optimal sets of effective indices that have the highest amount of accuracy in classifying water bodies from other land cover objects in the study areas. Considering the collective performance, genetic algorithm selects an optimal set of indices that can detect water bodies more accurately than any single index.
Artificial Intelligence
K. Ali Mohsin Alhameedawi; R. Asgarnezhad
Abstract
Background and Objectives: Autism is the most well-known disease that occurs in any age people. There is an increasing concern in appealing machine learning techniques to diagnose these incurable conditions. But, the poor quality of most datasets contains the production of efficient models for the forecast ...
Read More
Background and Objectives: Autism is the most well-known disease that occurs in any age people. There is an increasing concern in appealing machine learning techniques to diagnose these incurable conditions. But, the poor quality of most datasets contains the production of efficient models for the forecast of autism. The lack of suitable pre-processing methods outlines inaccurate and unstable results. For diagnosing the disease, the techniques handled to improve the classification performance yielded better results, and other computerized technologies were applied.Methods: An effective and high performance model was introduced to address pre-processing problems such as missing values and outliers. Several based classifiers applied on a well-known autism data set in the classification stage. Among many alternatives, we remarked that combine replacement with the mean and improvement selection with Random Forest and Decision Tree technologies provide our obtained highest results.Results: The best-obtained accuracy, precision, recall, and F-Measure values of the MVO-Autism suggested model were the same, and equal 100% outperforms their counterparts. Conclusion: The obtained results reveal that the suggested model can increase classification performance in terms of evaluation metrics. The results are evidence that the MVO-Autism model outperforms its counterparts. The reason is that this model overcomes both problems.
Artificial Intelligence
R. Mohammadi Farsani; E. Pazouki
Abstract
Background and Objectives: Many real-world problems are time series forecasting (TSF) problem. Therefore, providing more accurate and flexible forecasting methods have always been a matter of interest to researchers. An important issue in forecasting the time series is the predicated time interval.Methods: ...
Read More
Background and Objectives: Many real-world problems are time series forecasting (TSF) problem. Therefore, providing more accurate and flexible forecasting methods have always been a matter of interest to researchers. An important issue in forecasting the time series is the predicated time interval.Methods: In this paper, a new method is proposed for time series forecasting that can make more accurate predictions at larger intervals than other existing methods. Neural networks are an effective tool for estimating time series due to their nonlinearity and their ability to be used for different time series without specific information of those. A variety of neural networks have been introduced so far, some of which have been used in forecasting time series. Encoder decoder Networks are an example of networks that can be used in time series forcasting. an encoder network encodes the input data based on a particular pattern and then a decoder network decodes the output based on the encoded input to produce the desired output. Since these networks have a better understanding of the context, they provide a better performance. An example of this type of network is transformer. A transformer neural network based on the self-attention is presented that has special capability in forecasting time series problems.Results: The proposed model has been evaluated through experimental results on two benchmark real-world TSF datasets from different domain. The experimental results states that, in terms of long-term estimation Up to eight times more resistant and in terms of estimation accuracy about 20 percent improvement, compare to other well-known methods, is obtained. Computational complexity has also been significantly reduced.Conclusion: The proposed tool could perform better or compete with other introduced methods with less computational complexity and longer estimation intervals. It was also found that with better configuration of the network and better adjustment of attention, it is possible to obtain more desirable results in any specific problem.
Artificial Intelligence
M. Yousefi; R. Akbari; S. M. R. Moosavi
Abstract
Background and Objectives: It is generally accepted that the highest cost in software development is associated with the software maintenance phase. In corrective maintenance, the main task is correcting the bugs found by the users. These bugs are submitted by the users to a Bug Tracking System (BTS). ...
Read More
Background and Objectives: It is generally accepted that the highest cost in software development is associated with the software maintenance phase. In corrective maintenance, the main task is correcting the bugs found by the users. These bugs are submitted by the users to a Bug Tracking System (BTS). The bugs are evaluated by the bug triager and assigned to the developers to correct them. To find a related developer to correct the bug, recent developers’ activities and previous bug fixes must be examined. This paper presents an automated method to assign bugs to developers by identifying similarity between new bugs and previously reported bug reports.Methods: For automatic bug assignment, four clustering techniques (i.e. Expectation-Maximization (EM), Farthest First, Hierarchical Clustering, and Simple Kmeans) are used where a tag is created for each cluster that indicates an associated developer for bug correction. To evaluate the quality of the proposed methods, the clusters generated by the methods are compared with the labels suggested by an expert triager.Results: To evaluate the performance of the proposed method, we use real-world data of a large scale web-based system which is stored in the BTS of a software company. To select the appropriate algorithm for the clustering, the outputs of each clustering algorithm are compared to the labels suggested by the expert triager. The algorithm with closer output to the expert opinion is selected as the best algorithm. The results showed that EM and FarthestFirst clustering algorithms with 3% similarity error have the most similarity with the expert opinion.Conclusion: the results obtained by the algorithms show that we can successfully apply them for bug assignment in real-world software development environments.
Artificial Intelligence
I. Behravan; S. M. Razavi
Abstract
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, ...
Read More
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem.Methods: In this paper, a novel machine learning approach, which works in two phases, is introduced to predict the price of a stock in the next day based on the information extracted from the past 26 days. In the first phase of the method, an automatic clustering algorithm clusters the data points into different clusters, and in the second phase a hybrid regression model, which is a combination of particle swarm optimization and support vector regression, is trained for each cluster. In this hybrid method, particle swarm optimization algorithm is used for parameter tuning and feature selection. Results: The accuracy of the proposed method has been measured by 5 companies’ datasets, which are active in the Tehran Stock Exchange market, through 5 different metrics. On average, the proposed method has shown 82.6% accuracy in predicting stock price in 1-day ahead.Conclusion: The achieved results demonstrate the capability of the method in detecting the sudden jumps in the price of a stock.
Artificial Intelligence
M. Abdolahi; M. Zahedi
Volume 6, Issue 1 , January 2018, , Pages 15-24
Abstract
< p>Background and Objectives: Discourse coherence modeling evaluation becomes a critical but challenging task for all content analysis tasks in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like ...
Read More
< p>Background and Objectives: Discourse coherence modeling evaluation becomes a critical but challenging task for all content analysis tasks in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging in semantic and linguistic concepts of a text. It means that the problem cannot be solved very well and these methods are only very limited to available word co-occurrence information in the sequential sentences within a short part of a text. One of the greatest challenges of the above methods is their limitation in long documents coherence evaluation and being suitable for documents with low number of sentences. Methods: Our proposed method focuses on both local and global coherence. It can also assess the local topic integrity of text at the paragraph level regardless of word meaning and handcrafted rules. The global coherence in the proposed method is evaluated by sequence paragraph dependency. According to the derived results in word embeddings, by applying statistical approaches, the presented method incorporates the external word correlation knowledge into short and long stories to assess both local and global coherence, simultaneously. Results: Using the effect of combined word2vec vectors and most likely n-grams, we show that our proposed method is independent of the language and its semantic concepts. The derived results indicate that the proposed method offers the higher accuracy with respect to the other algorithms, in long documents with a high number of sentences. Conclusion: Our current study, comparing our proposed method with BGSEG method showed that the mean degree of coherence evaluation 1.19 percent improvement. The results in this study also indicate improvement results are much more in larger texts with more sentences.======================================================================================================Copyrights©2018 The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.======================================================================================================