Background and Objectives: Cloud Computing has brought a new dimension to the IT world. The technology of cloud computing allows employing a large number of Virtual Machines to run intensive applications. Each failure in running applications fails system operations. To solve the problem, it is required to restart the systems.
Methods: In this paper, to predict and avoid failure in HPC systems, a method of fault tolerance to High-Performance Computing systems (HPC) in the cloud is called Daemon-COA-MMT (DCM), has been proposed. In the proposed method, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing. The method consists of four modules, which are used to determine the host state. When the system is in the alarm state, the current host may face failure. Then the most optimal host for migration is selected, and process-level migration is performed. The method causes decreased migration overheads, decreased system performance speed, optimal use of underutilized hosts instead of leasing new hosts, appropriate load balancing, equal use of hardware resources by all hosts, focusing on QoS and SLA, and the significant decrease of energy consumption.
Results: The simulation results revealed that in terms of parameters, the proposed method declines average job makespan, average response time, and average task execution cost by 18.06%, 35.68%, and 24.6%, respectively. The proposed fault tolerance algorithm has improved energy consumption by 30% and decreased the HPC systems' failure rate.
Conclusion: In this study, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing in high performance computing in the cloud computing.
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/
JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional afflictions.
Shahid Rajaee Teacher Training University