An Effective Ensemble of Deep and Machine Learning Methods for Classifying the Expertise Shape of CQA Users

Nemati, S.

doi:10.22061/jecei.2024.10621.724

Document Type : Original Research Paper

Author

S. Nemati

Computer Engineering Department, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran.

https://doi.org/10.22061/jecei.2024.10621.724

Abstract

Background and Objectives: Community question-answering (CQA) websites have become increasingly popular as platforms for individuals to seek and share knowledge. Identifying users with a special shape of expertise on CQA websites is a beneficial task for both companies and individuals. Specifically, finding those who have a general understanding of certain areas but lack expertise in other fields is crucial for companies who are planning internship programs. These users, called dash-shaped users, are willing to work for low wages and have the potential to quickly develop into skilled professionals, thus minimizing the risk of unsuccessful recruitment. Due to the vast number of users on CQA websites, they provide valuable resources for finding individuals with various levels of expertise. This study is the first of its kind to directly classify CQA users based solely on the textual content of their posts.
Methods: To achieve this objective, we propose an ensemble of advanced deep learning algorithms and traditional machine learning methods for the binary classification of CQA users into two categories: those with dash-shaped expertise and those without. In the proposed method, we used the stack generalization to fuse the results of the dep and machine learning methods. To evaluate the effectiveness of our approach, we conducted an extensive experiment on three large datasets focused on Android, C#, and Java topics extracted from the Stack Overflow website.
Results: The results on four datasets of the Stack Overflow, demonstrate that our ensemble method not only outperforms baseline methods including seven traditional machine learning and six deep models, but it achieves higher performance than state-of-the-art deep models by an average of 10% accuracy and F1-measure.
Conclusion: The proposed model showed promising results in confirming that by using only their textual content of questions, we can classify the users in CQA websites. Specifically, the results showed that using the contextual content of the questions, the proposed model can be used for detecting the dash-shaped users precisely. Moreover, the proposed model is not limited to detecting dash-shaped users. It can also classify other shapes of expertise, such as T- and C-shaped users, which are valuable for forming agile software teams. Additionally, our model can be used as a filter method for downstream applications, like intern recommendations.

Keywords

Main Subjects

Data mining

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Publisher

Shahid Rajaee Teacher Training University

References

[1] P. Rostami, M. Neshati, “Intern retrieval from community question answering websites: A new variation of expert finding problem,” Expert Syst. Appl., 181: 115044, 2021.

[2] S. Yuan, Y. Zhang, J. Tang, W. Hall, J. B. Cabotà, “Expert finding in community question answering: a review,” Artif. Intell. Rev., 53: 843-874, 2020.

[3] A. Dargahi Nobari, S. Sotudeh Gharebagh, M. Neshati, “Skill translation models in expert finding,” in Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval: 1057-1060, 2017.

[4] H. Demirkan, J. Spohrer, “T-shaped innovators: Identifying the right talent to support service innovation,” Res. Technol. Manage., 58(5): 12-15, 2015.

[5] V. Kumar, N. Pedanekar, “Mining shapes of expertise in online social Q&A communities,” in Proc. the 19th ACM conference on Computer Supported Cooperative Work and Social Computing Companion: 317-320, 2016.

[6] S. S. Gharebagh, P. Rostami, M. Neshati, “T-shaped mining: A novel approach to talent finding for agile software teams,” in Proc. Advances in Information Retrieval: 40th European Conference on IR Research: 411-423, 2018.

[7] C. P. Maertz Jr, P. A. Stoeberl, J. Marks, “Building successful internships: lessons from the research for interns, schools, and employers,” Career Dev. Int., 19(1): 123-142, 2014.

[8] X. Fu, X. Sun, H. Wu, L. Cui, J. Z. Huang, “Weakly supervised topic sentiment joint model with word embeddings,” Knowl. Based. Syst., 147: 43-54, 2018.

[9] H. Wang, K. Guo, “The impact of online reviews on exhibitor behaviour: evidence from movie industry,” Enterp. Inf. Syst., 11(10): 1518-1534, 2017.

[10] D. Kundu, D. P. Mandal, “Formulation of a hybrid expertise retrieval system in community question answering services,” Appl. Intell., 49: 463-477, 2019.

[11] S. Sorkhani, R. Etemadi, A. Bigdeli, M. Zihayat, E. Bagheri, “Feature-based question routing in community question answering platforms,” Inf. Sci. (N Y), 608: 696-717, 2022.

[12] X. Zhang et al., “Temporal context-aware representation learning for question routing,” in Proc. the 13th International Conference on Web Search and Data Mining: 753-761, 2020.

[13] H. Ding, Q. Liu, G. Hu, “TDTMF: A recommendation model based on user temporal interest drift and latent review topic evolution with regularization factor,” Inf. Process. Manage., 59(5): 103037, 2022.

[14] P. Rostami, A. Shakery, “A deep learning-based expert finding method to retrieve agile software teams from CQAs,” Inf. Process. Manage., 60(2): 103144, 2023.

[15] K. Balog, L. Azzopardi, M. de Rijke, “A language modeling framework for expert finding,” Inf. Process. Manage., 45(1): 1-19, 2009.

[16] S. Liang, M. de Rijke, “Formal language models for finding groups of experts,” Inf. Process. Manage., 52(4): 529-549, 2016.

[17] D. Petkova, W. B. Croft, “Hierarchical language models for expert finding in enterprise corpora,” Int. J. Artif. Intell. Tools, 17(01): 5-18, 2008.

[18] M. Bouguessa, B. Dumoulin, S. Wang, “Identifying authoritative actors in question-answering forums: the case of yahoo! answers,” in Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 866-874, 2008.

[19] H. Zhu, E. Chen, H. Xiong, H. Cao, J. Tian, “Ranking user authority with relevant knowledge categories for expert finding,” World Wide Web, 17: 1081-1107, 2014.

[20] A. Daud, J. Li, L. Zhou, F. Muhammad, “Temporal expert finding through generalized time topic modeling,” Knowl. Based Syst., 23(6): 615-625, 2010.

[21] S. Momtazi, F. Naumann, “Topic modeling for expert finding using latent Dirichlet allocation,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 3(5): 346-353, 2013.

[22] L. Yang et al., “Cqarank: jointly model topics and expertise in community question answering,” in Proc. the 22nd ACM International Conference on Information & Knowledge Management: 99-108, 2013.

[23] N. Nikzad-Khasmakhi, M. Balafar, M. R. Feizi-Derakhshi, C. Motamed, “ExEm: Expert embedding using dominating set theory with deep learning approaches,” Expert Syst. Appl., 177: 114913, 2021.

[24] M. Zhao, F. Javed, F. Jacob, M. McNair, “SKILL: A system for skill identification and normalization,” in Proc. the AAAI Conference on Artificial Intelligence: 4012-4017, 2015.

[25] A. Azzam, N. Tazi, A. Hossny, “Text-based question routing for question answering communities via deep learning,” in Proc. the Symposium on Applied Computing: 1674-1678, 2017.

[26] M. Dehghan, H. A. Rahmani, A. A. Abin, V. V. Vu, “Mining shape of expertise: A novel approach based on convolutional neural network,” Inf. Process. Manage., 57(4): 102239, 2020.

[27] Z. Li, J. Y. Jiang, Y. Sun, W. Wang, “Personalized question routing via heterogeneous network embedding,” in Proc. the AAAI Conference on Artificial Intelligence: 192-199, 2019.

[28] W. Tang, T. Lu, D. Li, H. Gu, N. Gu, “Hierarchical attentional factorization machines for expert recommendation in community question answering,” IEEE Access, 8: 35331-35343, 2020.

[29] A. I. Kadhim, “Survey on supervised machine learning techniques for automatic text classification,” Artif. Intell. Rev., 52(1): 273-292, 2019.

[30] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, “Text classification algorithms: A survey,” Information, 10(4): 150, 2019.

[31] Q. Li et al., “A survey on text classification: From shallow to deep learning,” arXiv preprint arXiv:2008.00364, 2020.

[32] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, J. Gao, “Deep learning–based text classification: a comprehensive review,” ACM Comput. Surv. (CSUR), 54(3): 1-40, 2021.

[33] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.

[34] S. Yu, D. Liu, W. Zhu, Y. Zhang, S. Zhao, “Attention-based LSTM, GRU and CNN for short text classification,” J. Intell. Fuzzy Syst., 39(1): 333-340, 2020.

[35] M. Zulqarnain et al., “Text classification using deep learning models: A Comparative review,” Cloud Comput. Data Sci., 5(1): 80-96, 2024.

[36] A. Ezen-Can, “A comparison of LSTM and BERT for small corpus,” arXiv preprint arXiv:2009.05451, 2020.

[37] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

[38] S. González-Carvajal, E. C. Garrido-Merchán, “Comparing BERT against traditional machine learning text classification,” arXiv preprint arXiv:2005.13012, 2020.

[39] M. E. Basiri, S. Nemati, M. Abdar, E. Cambria, U. R. Acharya, “ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis,” Future Gener. Comput. Syst., 115: 279-294, 2021.

[40] S. Nemati, “Canonical correlation analysis for data fusion in multimodal emotion recognition,” in Proc. 9th International Symposium on Telecommunication: With Emphasis on Information and Communication Technology, IST 2018, 2019.

[41] A. Mohammed, R. Kora, “A comprehensive review on ensemble deep learning: Opportunities and challenges,” J. King Saud Univ. Comput. Inf. Sci., 35(2): 754-774, 2023.

[42] J. Wang, L. C. Yu, K. R. Lai, X. Zhang, “Dimensional sentiment analysis using a regional CNN-LSTM model,” in Proc. the 54th Annual Meeting of the Association for Computational Linguistics, (2: Short papers): 225-230, 2016.

[43] S. M. Rezaeinia, R. Rahmani, A. Ghodsi, H. Veisi, “Sentiment analysis based on improved pre-trained word embeddings,” Expert Syst. Appl., 117: 139-147, 2019.

[44] A. Chatterjee, U. Gupta, M. K. Chinnakotla, R. Srikanth, M. Galley, P. Agrawal, “Understanding emotions in text using deep learning and big data,” Comput. Human Behav., 93: 309-317, 2019.

[45] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, “Hierarchical attention networks for document classification,” in Proc. the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 1480-1489, 2016.

[46] S. Wen, J. Li, “Recurrent convolutional neural network with attention for twitter and yelp sentiment classification: ARC model for sentiment classification,” in Proc, the 2018 International Conference on Algorithms, Computing and Artificial Intelligence: 1-7, 2018.

[47] G. Liu, J. Guo, “Bidirectional LSTM with attention mechanism and convolutional layer for text classification,” Neurocomput., 337: 325-338, 2019.

LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Journal of Electrical and Computer Engineering Innovations (JECEI)

An Effective Ensemble of Deep and Machine Learning Methods for Classifying the Expertise Shape of CQA Users

References

References

Send comment about this article

Volume 12, Issue 2
July 2024
Pages 409-424

An Effective Ensemble of Deep and Machine Learning Methods for Classifying the Expertise Shape of CQA Users

References

References

Send comment about this article

Volume 12, Issue 2July 2024Pages 409-424

Volume 12, Issue 2
July 2024
Pages 409-424