Deep Reinforcement Learning for Efficient Multilingual Dialogue Management

Nasri-Lowshani, Mohammad Javad; Salimi Sartakhti, Javad; Ebrahimpour-Komole, Hossein

doi:10.22061/jecei.2025.11348.814

Articles in Press

Document Type : Original Research Paper

Authors

Department of Artificial Intelligence, Faculty of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

https://doi.org/10.22061/jecei.2025.11348.814

Abstract

Background and Objectives: Developing efficient task-oriented dialogue systems capable of handling multilingual interactions is a growing area of research in natural language processing (NLP). In this paper, we propose SenSimpleDS, a deep reinforcement learning-based joint task-oriented dialogue system, designed for multilingual conversations.
Methods: The system utilizes a deep Q-network and the SBERT model to represent the dialogue environment. We introduce two variants, SenSimpleDS+ and SenSimpleDS-NSP, which incorporate modifications in the ε-greedy method and leverage next sequence prediction (NSP) using BERT to refine the reward function. These methods are evaluated on datasets in English, Persian, Spanish, and German, and compared with baseline methods such as SimpleDS and SCGSimpleDS.
Results: Our experimental results demonstrate that the proposed methods outperform the baselines in terms of average collected rewards, requiring fewer learning steps to achieve optimal dialogue policies. Notably, the incorporation of NSP significantly improves performance by optimizing reward collection. The multilingual SenSimpleDS further showcases the system’s ability to function across languages using a random forest classifier for language detection and MPNet for environment construction. In addition to system evaluations, we introduce a new Persian dataset for task-oriented dialogue in the restaurant domain, expanding the resources available for developing dialogue systems in low-resource languages.
Conclusion: SenSimpleDS, a deep reinforcement learning-based joint task-oriented dialogue system, demonstrates superior performance over baseline methods by leveraging deep Q-networks, SBERT. The integration of next sequence prediction (NSP) significantly enhances reward optimization, enabling faster convergence to optimal dialogue policies. This work establishes a foundation for future research in multilingual dialogue systems, with potential applications across diverse service domains.

Keywords

Main Subjects

Natural Language Processing

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Publisher

Shahid Rajaee Teacher Training University

References

[1] X. Wang, C. Yuan, "Recent advances on human-computer dialogue," CAAI Trans. Intell. Technol., 1(4): 303-312, 2016.

[2] H. Chen, X. Liu, D. Yin, J. Tang, "A survey on dialogue systems: recent advances and new frontiers," SIGKDD Explor. Newsl., 19(2): 25-35, 2017.

[3] B. Liu, G. Tür, D. Hakkani-Tür, P. Shah, L. Heck, "Dialogue learning with human teaching and feedback in end-to-end trainable task-oriented dialogue systems," in Proc. Human Language Technologies: 2060-2069, 2018.

[4] P. Budzianowski, I. Vulić, "Hello, It's GPT-2 - How can i help you? towards the use of pretrained language models for task-oriented dialogue systems," in Proc. 3rd Workshop on Neural Generation and Translation: 15-22, 2019.

[5] F. Almeida, G. B. Xexéo, "Word embeddings: A survey," ArXiv, abs/1901.09069, 2019.

[6] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. Human Language Technologies: 4171-4186, 2019.

[7] J. Pennington, R. Socher, C. Manning, "GloVe: Global vectors for word representation," in Proc. Empirical Methods in Natural Language Processing (EMNLP): 1532-1543, 2014.

[8] N. Reimers, I. Gurevych, "Sentence-BERT: Sentence embeddings using siamese BERT-Networks," in Proc. Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): 3982-3992, 2019.

[9] X. Zhang, H. Wang, "A joint model of intent determination and slot filling for spoken language understanding," in Proc. 25th International Joint Conference on Artificial Intelligence: 2993-2999, 2016.

[10] M. Rafiepour, J. S. Sartakhti, "CTRAN: CNN-Transformer-based network for natural language understanding," Eng. Appl. Artif. Intell., 126(PC): 9, 2023.

[11] Y. Shi, K. Yao, H. Chen, Y. C. Pan, M. Y. Hwang, B. Peng, "Contextual spoken language understanding using recurrent neural networks," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 5271-5275, 2015.

[12] Z. Yan, N. Duan, P. Chen, M. Zhou, J. Zhou, Z. Li, "Building task-oriented dialogue systems for online shopping," in Proc. AAAI conference on artificial intelligence, 31(1): 4618-4625, 2017.

[13] T. H. Wen, M. Gašić, N. Mrkšić, P. H. Su, D. Vandyke, S. Young, "Semantically conditioned LSTM-based natural language generation for spoken dialogue systems," in Proc. Empirical Methods in Natural Language Processing: 1711-1721, 2015.

[14] O. Dušek, F. Jurčíček, "Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings," in Proc. 54th Annual Meeting of the Association for Computational Linguistics, 2: 45-51, 2016.

[15] Z. Jiang, X. L. Mao, Z. Huang, J. Ma, S. Li, "Towards end-to-end learning for efficient dialogue agent by modeling looking-ahead ability," in Proc. 20th Annual SIGdial Meeting on Discourse and Dialogue: 133-142, 2019.

[16] H. Cuayáhuitl, "SimpleDS: A simple deep reinforcement learning dialogue system," in Proc. International Workshop on Spoken Dialogue Systems Technology, 2016.

[17] H. Cuayáhuitl, S. Yu, A. Williamson, J. Carse, "Deep reinforcement learning for multi-domain dialogue systems," ArXiv, abs/1611.08675, 2016.

[18] H. Cuayáhuitl, S. Yu, A. Williamson, J. Carse, "Scaling up deep reinforcement learning for multi-domain dialogue systems," in Proc. International Joint Conference on Neural Networks (IJCNN): 3339-3346, 2017.

[19] Z. Dehghanipour, J. Salimi, "An improved deep reinforcement learning for task-oriented dialogue system," Preprint, 2022.

[20] V. Ilievski, C. Musat, A. Hossmann, M. Baeriswyl, "Goal-oriented chatbot dialog management bootstrapping with transfer learning," in Proc. 27th International Joint Conference on Artificial Intelligence: 4115-4121, 2018.

[21] H. Cuayáhuitl, "A data-efficient deep learning approach for deployable multimodal social robots," Neurocomputing, 396: 587-598, 2020.

[22] Y. Ma, X. Wang, Z. Dong, H. Chen, "Cascaded LSTMs based deep reinforcement learning for goal-driven dialogue," in Proc. Natural Language Processing and Chinese Computing: 29-41, 2018.

[23] T. H. Wen, D. Vandyke, N. Mrkšić, M. Gašić, L. M. Rojas-Barahona, P. H. Su, S. Ultes, S. Young, "A network-based end-to-end trainable task-oriented dialogue system," in Proc. 15th Conference of the European Chapter of the Association for Computational Linguistics, 1: 438-449, 2017.

[24] X. Li, Y. N. Chen, L. Li, J. Gao, A. Celikyilmaz, "End-to-end task-completion neural dialogue systems," in Proc. 8th International Joint Conference on Natural Language Processing, 1: 733-743, 2017.

[25] M. Sharma, T. Russell-Rose, L. Barakat, A. Matsuo, "Building a legal dialogue system: development process, challenges and opportunities," ArXiv, abs/2109.00381, 2021.

[26] D. Ham, J. G. Lee, Y. Jang, K. E. Kim, "End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2," in Proc. 58th Annual Meeting of the Association for Computational Linguistics: 583-592, 2020.

[27] J. Kulhánek, V. Hudeček, T. Nekvinda, O. Dušek, "AuGPT: Auxiliary tasks and data augmentation for end-to-end dialogue with pre-trained language models," in Proc. 3rd Workshop on Natural Language Processing for Conversational AI: 198-210, 2021.

[28] Z. Borhanifard, H. Basafa, S. Z. Razavi, H. Faili, "Persian language understanding in task-oriented dialogue system for online shopping," in Proc. 11th International Conference on Information and Knowledge Technology (IKT): 79-84, 2020.

[29] K. Mahmoudi, H. Faili, "PerSHOP--A Persian dataset for shopping dialogue systems modeling," ArXiv, abs/2401.00811, 2024.

[30] A. Ghandeharioun, J. H. Shen, N. Jaques, C. Ferguson, N. Jones, A. Lapedriza, R. Picard, "Approximating interactive human evaluation with self-play for open-domain dialog systems," in Proc. 33rd International Conference on Neural Information Processing Systems: 13665-13676, 2019.

[31] N. Reimers, I. Gurevych, "Making monolingual sentence embeddings multilingual using knowledge distillation," in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP): 4512-4525, 2020.

[32] E. Razumovskaia, G. Glavas, O. Majewska, E. M. Ponti, A. Korhonen, I. Vulic, "Crossing the conversational chasm: A primer on natural language processing for multilingual task-oriented dialogue systems," J. Artif. Intell. Res., 741351-1402, 2022.

LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Journal of Electrical and Computer Engineering Innovations (JECEI)

Deep Reinforcement Learning for Efficient Multilingual Dialogue Management

References

References

Send comment about this article

Articles in Press, Accepted Manuscript
Available Online from 04 May 2025

Deep Reinforcement Learning for Efficient Multilingual Dialogue Management

References

References

Send comment about this article

Articles in Press, Accepted Manuscript Available Online from 04 May 2025

Articles in Press, Accepted Manuscript
Available Online from 04 May 2025