Document Type : Original Research Paper

Authors

1 Department of Electrical Engineering, PhD student, University of Birjand, i.behravan@birjand.ac.ir

2 Department of Electrical Engineering, Faculty of Engineering, University of Birjand, hzahiri@birjand.ac.ir

3 Department of Electrical Engineering, Faculty of Engineering, University of Birjand,

4 KDD lab, ISTI-CNR, Pisa, Italy, roberto.trasarti@isti.cnr.it

Abstract

Background and Objectives: Big data referred to huge datasets with high number of objects and high number of dimensions. Mining and extracting big datasets is beyond the capability of conventional data mining algorithms including clustering algorithms, classification algorithms, feature selection methods and etc.
 Methods: Clustering, which is the process of dividing the data points of a dataset into different groups (clusters) based on their similarities and dissimilarities, is an unsupervised learning method which discovers useful information and hidden patterns from raw data. In this research a new clustering method for big datasets is introduced based on Particle Swarm Optimization (PSO) algorithm. The proposed method is a two-stage algorithm which first searches the solution space for proper number of clusters and then searches to find the position of the centroids.
Results: the performance of the proposed method is evaluated on 13 synthetic datasets. Also its performance is compared to X-means through calculating two evaluation metrics: Rand index and NMI index. The results demonstrate the superiority of the proposed method over X-means for all of the synthetic datasets.  Furthermore, a biological microarray dataset is used to evaluate the proposed method deeper. Finally, 2 real big mobility datasets, including the trajectories traveled by several cars in the city of Pisa, are analyzed using the proposed clustering method. The first dataset includes the trajectories recorded in Sunday and the second one contains the trajectories recorded in Monday during 5 weeks. The achieved results showed that people choose more diverse destinations in Sunday although it has fewer trajectories.
Conclusion: Finding the number of clusters is a big challenge especially fir big datasets. The results achieved for the proposed method showed its fabulous performance in detecting the number of clusters for high dimensional and massive datasets. Also, the results demonstrate the power and effectiveness of the swarm intelligence methods in solving hard and complex optimization problems.


======================================================================================================
Copyrights
©2018 The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.
======================================================================================================

Keywords

Main Subjects


LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.


[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

CAPTCHA Image