Document Type : Original Research Paper
Author
Department of Computer science, Faculty of Mathematical Sciences, Vali-e-Asr University ofRafsanjan, Rafsanjan, Iran
Abstract
Background and Objectives: Numerical iterative methods are widely used to compute reachability probabilities and expected rewards in probabilistic model checking of Markov decision processes. Several approaches have been proposed to improve the performance of these iterative methods. Reducing the total number of iterations is an important goal that is followed by several techniques in this paper.
Methods: In this paper, we consider MDPs with different levels of non-determinism. We show that modified policy iteration performs better than the other standard iterative methods when the degree of non-determinism increases. We propose some novel methods to improve the performance of the modified policy iteration method. Our approach is to define several criteria to avoid useless iterations and updates of the modified policy iteration method. The first criterion considers the graphical structure of the model to define the number of iterations after each policy modification. The second criterion is a dynamic one to determine when a policy should be updated. The third proposed approach in this work is to use several priority heaps to select states for value updates.
Results: Our proposed methods are implemented in the PRISM model checker and applied on several standard case studies. The results of experiments show that the running times of our approaches are less than the running time of the standard and previous methods for most case studies. The main reason for these results is that the total numbers of iterations and updates are reduced by our approaches, which results in an improvement in the performance of the method. While the running times are reduced by our approaches, the precision of computations are kept in most cases.
Conclusion: The proposed techniques are able to reduce the number of iterations and accelerate the convergence to the optimal policies. The also prioritize the commutations to reduce the total number of updates.
Keywords
- Expected rewards
- Markov decision processes
- Modified policy iteration
- Probabilistic model checking
- Reachability probabilities
Main Subjects
Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/
Publisher’s Note
JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Publisher
Shahid Rajaee Teacher Training University
Send comment about this article