Document Type : Original Research Paper

Authors

Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

Abstract

Background and Objectives: Most of the recent dialogue policy learning ‎methods are based on reinforcement learning (RL). However, the basic RL ‎algorithms like deep Q-network, have drawbacks in environments with ‎large state and action spaces such as dialogue systems. Most of the ‎policy-based methods are slow, cause of the estimating of the action value ‎using the computation of the sum of the discounted rewards for each ‎action. In value-based RL methods, function approximation errors lead to ‎overestimation in value estimation and finally suboptimal policies. There ‎are works that try to resolve the mentioned problems using combining RL ‎methods, but most of them were applied in the game environments, or ‎they just focused on combining DQN variants. This paper for the first time ‎presents a new method that combines actor-critic and double DQN named ‎Double Actor-Critic (DAC), in the dialogue system, which significantly ‎improves the stability, speed, and performance of dialogue policy learning. ‎
Methods: In the actor critic to overcome the slow learning of normal DQN, ‎the critic unit approximates the value function and evaluates the quality ‎of the policy used by the actor, which means that the actor can learn the ‎policy faster. Moreover, to overcome the overestimation issue of DQN, ‎double DQN is employed. Finally, to have a smoother update, a heuristic ‎loss is introduced that chooses the minimum loss of actor-critic and ‎double DQN. ‎
Results: Experiments in a movie ticket booking task show that the ‎proposed method has more stable learning without drop after ‎overestimation and can reach the threshold of learning in fewer episodes ‎of learning. ‎
Conclusion: Unlike previous works that mostly focused on just proposing ‎a combination of DQN variants, this study combines DQN variants with ‎actor-critic to benefit from both policy-based and value-based RL methods ‎and overcome two main issues of both of them, slow learning and ‎overestimation. Experimental results show that the proposed method can ‎make a more accurate conversation with a user as a dialogue policy ‎learner.‎

Keywords

Main Subjects

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

 

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

 

Publisher

Shahid Rajaee Teacher Training University


LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.


[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

CAPTCHA Image