Document Type : Original Research Paper


Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran


Background and Objectives: Cloud Computing has brought a new dimension to the IT world. The technology of cloud computing allows employing a large number of Virtual Machines to run intensive applications. Each failure in running applications fails system operations. To solve the problem, it is required to restart the systems.
Methods: In this paper, to predict and avoid failure in HPC systems, a method of fault tolerance to High-Performance Computing systems (HPC) in the cloud is called Daemon-COA-MMT (DCM), has been proposed. In the proposed method, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing. The method consists of four modules, which are used to determine the host state. When the system is in the alarm state, the current host may face failure. Then the most optimal host for migration is selected, and process-level migration is performed. The method causes decreased migration overheads, decreased system performance speed, optimal use of underutilized hosts instead of leasing new hosts, appropriate load balancing, equal use of hardware resources by all hosts, focusing on QoS and SLA, and the significant decrease of energy consumption.
Results: The simulation results revealed that in terms of parameters, the proposed method declines average job makespan, average response time, and average task execution cost by 18.06%, 35.68%, and 24.6%, respectively. The proposed fault tolerance algorithm has improved energy consumption by 30% and decreased the HPC systems' failure rate.
Conclusion: In this study, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing in high performance computing in the cloud computing.


Main Subjects

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit:


Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional afflictions.



Shahid Rajaee Teacher Training University


Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.