Document Type: Original Research Paper

Authors

1 Department of Data and Computer Science. Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.

2 Department of Data and Computer Science. Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.

10.22061/jecei.2020.7247.377

Abstract

Background and Objectives: -similarity problem defined as measuring the similarity among  objects and finding a group of  objects from a dataset that have the most similarity to each other. This problem has been become an important issue in information retrieval and data mining. Theory of this concept is mathematically proven, but it practically has high memory complexity and is so time consuming. Besides, the solutions found by metaheuristics are not exact.
Methods: This paper is conducted to propose an exact method to solve -similarity problem reducing the memory complexity and decreasing the execution time by parallelism using Open-MP. The experiments are performed on the application of text document resemblance.
Results: It has been shown that the memory complexity of the proposed method is decreased to , and the experimental results show that this method accelerates the speed of the computations about 5 times.
Conclusion: The simulated results of the proposed method display a good improvement in speed, the used memory space, and scalability compared with the previous exact method.

Keywords

Main Subjects

[1] M. Keshavarzi, M. A. Dehghan, M. Mashinchi, “Applications of classification based on similarities and dissimilarities,” Fuzzy Information and Engineering, 4(1): 75-92, 2012.

[2] M. Keshavarzi, M. A. Dehghan, M. Mashinchi, “Classification based on 3-similarity, Iranian Journal of Mathematical Sciences and Informatics,” 6(1): 7-21, 2011.

[3] M. Keshavarzi, “Classification based on similarity and dissimilarity”, PhD thesis, Shahid Bahonar University of Kerman, Iran, 2010.

[4] S. Theodoridis, K. Koutroumbas, Pattern recognition, Academic Press, 2003.

[5] L. Kaufman, P. J. Rousseeuw, Finding Group in Data An Introduction to Cluster Analysis, Wiley, New York, 2005.

[6] W. J. Wang, “New similarity measure on fuzzy sets and on elements”, Fuzzy Sets and Systems, 85(3): 305-309, 1997.

[7] H. Rezaei, M. Emoto, M. Mukaidono, “New similarity measure between two fuzzy sets,” Journal of Advanced Computational Intelligence and Intelligent Informatics, 10(6): 946-953, 2006.

[8] J. Ye, “Cosine similarity measures for intuitionistic fuzzy sets and their applications,” Mathematical and Computer Modeling, 53: 91–97, 2011.

[9] M. Mirhoseini, M. Mashinchi, H. Nezamabadi-pour, “Improving n-Similarity problem by genetic algorithm and its application in text document resemblance,” Fuzzy Information and Engineering, 6: 263-278, 2014.

[10] M. Mirhoseini, H. Nezamabadi-pour, “Metaheuristic Search Algorithms in Solving the n-Similarity Problem,” Fundamenta Informaticae, 152(2): 145-166, 2017.

[11] K. Lakshmanan, S. Kato, R. Rajkumar, “Scheduling Parallel Real-Time Tasks on Multi-core Processors,” in Proc. 2010 31st IEEE Real-Time Systems Symposium: 259-268, 2010.

[12] M. K. Fallah, V. S. Keshvari, M. Fazlali, “A Parallel Hybrid Genetic Algorithm for Solving the Maximum Clique Problem,” in Proc. High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, 891: 378-393, 2019.

[13] M. K. Fallah, M. Mirhosseini, M. Fazlali, M. Daneshtalab, "Scalable Parallel Genetic Algorithm For Solving Large Integer Linear Programming Models Derived From Behavioral Synthesis," in Proc. 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP): 390-394, 2020.

[14] S. Hong, T. Oguntebi, K. Olukotun, “Efficient Parallel Graph Exploration on Multi-Core CPU and GPU”, 2011 International Conference on Parallel Architectures and Compilation Techniques, Galveston, TX, pp. 78-88, 2011.

[15] M. Mirhosseini, M. Fazlali, G. Gaydadjiev, “A Parallel and Improved Quadrivalent Quantum-Inspired Gravitational Search Algorithm in Optimal Design of WSNs,” High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, 891: 352-366, 2019.

[16] P. Delisle, M. Krajecki, M. Gravel, C. Gagné, “Parallel implementation of an ant colony optimization metaheuristic with
OpenMP,” In Proceedings of the 3rd European Workshop on OpenMP (EWOMP’01): 1-7, 2001.

[17] L. Dagum, M. Menon. “OpenMP: an industry standard API for shared-memory programming,” IEEE Computational Science and Engineering, 5(1): 46-55, 1998

[18] http://www.daviddlewis.com/resources/testcollections/reuters21578/.

[19] http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/a11-smart-stop-list/.

[20] M. F. Porter, “An algorithm for suffix stripping”, Program, 14(3): 130–137, 1980.

C. Qimin, G. Qiao, W. Yongliang, W. Xianghu, “Text clustering using VSM with feature clusters”, Neural Computing and Applications, vol 26, pp. 995- 1003, 2015.