Document Type: Original Research Paper


Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran



Background and Objectives: Cloud Computing has brought a new dimension to the IT world. The technology of cloud computing allows employing a large number of Virtual Machines to run intensive applications. Each failure in running applications fails system operations. To solve the problem, it is required to restart the systems.
Methods: In this paper, to predict and avoid failure in HPC systems, a method of fault tolerance to High-Performance Computing systems (HPC) in the cloud is called Daemon-COA-MMT (DCM), has been proposed. In the proposed method, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing. The method consists of four modules, which are used to determine the host state. When the system is in the alarm state, the current host may face failure. Then the most optimal host for migration is selected, and process-level migration is performed. The method causes decreased migration overheads, decreased system performance speed, optimal use of underutilized hosts instead of leasing new hosts, appropriate load balancing, equal use of hardware resources by all hosts, focusing on QoS and SLA, and the significant decrease of energy consumption.
Results: The simulation results revealed that in terms of parameters, the proposed method declines average job makespan, average response time, and average task execution cost by 18.06%, 35.68%, and 24.6%, respectively. The proposed fault tolerance algorithm has improved energy consumption by 30% and decreased the HPC systems' failure rate.
Conclusion: In this study, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing in high performance computing in the cloud computing.


Main Subjects

[1] M. Vaishnnave, K.S. Devi, P. Srinivasan, “A survey on cloud computing and hybrid cloud,” Int. J. Appl. Eng. Res., 14: 429-434, 2019.

[2] M.U. Bokhari, Q. Makki, Y.K. Tamandani, “A survey on cloud computing,” Big Data Analytics: 149-164, 2018.

[3] F.A. Ibrahim, E.E. Hemayed, “Trusted cloud computing architectures for infrastructure as a service: Survey and systematic literature review,” Computers & Security, 82: 196-226, 2019.

[4] A.M. Caulfield, E.S. Chung, A. Putnam, H. Angepat, D. Firestone, J. Fowers, et al., “Configurable clouds,” IEEE Micro, 37(3): 52-61, 2017.

[5] F. Zafar, A. Khan, S.U.R. Malik, M. Ahmed, A. Anjum, M.I. Khan, et al., “A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends,” Computers & Security: 65, 29-49, 2017.

[6] K. O'brien, I. Pietri, R. Reddy, A. Lastovetsky, R. Sakellariou, “A survey of power and energy predictive models in HPC systems and applications,” ACM Computing Surveys (CSUR), 50(3):1-38, 2017.

[7] M.A. Netto, R.N. Calheiros, E.R. Rodrigues, R.L. Cunha, R. Buyya, “HPC cloud for scientific and business applications: taxonomy, vision, and research challenges,” ACM Computing Surveys (CSUR), 51(1): 1-29, 2018.

[8] A. Pradhan, S.K. Bisoy, P.K. Mallick, “Load Balancing in Cloud Computing: Survey,” Innovation in Electrical Power Engineering, Communication, and Computing Technology: 99-111, 2020.

[9] M.R. Mesbahi, A.M. Rahmani, M. Hosseinzadeh, “Reliability and high availability in cloud computing environments: a reference roadmap,” Human-centric Computing and Information Sciences, 8(1): 20, 2018.

[10] M.N. Cheraghlou, A. Khadem-Zadeh, M. Haghparast, “A survey of fault tolerance architecture in cloud computing,”Journal of Network and Computer Applications, 61: 81-92, 2016.

[11] A. Rezaeipanah, M. Mojarad, A. Fakhari, “Providing a new approach to increase fault tolerance in cloud computing using fuzzy logic,” International Journal of Computers and Applications: 1-9, 2000.

[12] Q. Lin, K. Hsieh, Y. Dang, H. Zhang, K. Sui, Y. Xu, et al., “Predicting Node failure in cloud service systems. in Proc. the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering: 480-490, 2018.

[13] A.A. Shaikh, S. Ahmad, “Fault tolerance management for cloud environment: a critical review,” International Journal of Advanced Research in Computer Science, 9(Special Issue 2): 34, 2018.

[14] A. Hota, S. Mohapatra, S. Mohanty, “Survey of different load balancing approach-based algorithms in cloud computing: a comprehensive review,” Computational intelligence in data mining: 99-110, 2019.

[15] P. Kumar, R. Kumar, “Issues and challenges of load balancing techniques in cloud computing: A survey,” ACM Computing Surveys (CSUR), 51(6): 1-35, 2019.

 [16] M. Kumar, S.C. Sharma, “Dynamic load balancing algorithm to minimize the makespan time and utilize the resources effectively in cloud environment,” International Journal of Computers and Applications, 42(1), 108-117, 2020.

[17] K. Pan, J. Chen, “Load balancing in cloud computing environment based on an improved particle swarm optimization,” in Proc. 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS): 595-598, 2015.

[18] F. Abazari, M. Analoui, H.  Takabi, S. Fu, “MOWS: multi-objective workflow scheduling in cloud computing based on heuristic algorithm,” Simulation Modelling Practice and Theory, 93: 119-132, 2019.

 [19] M. Abd Elaziz, S. Xiong, K.P.N.  Jayasena, L. Li, “Task scheduling in cloud computing based on hybrid moth search algorithm and differential evolution,” Knowledge-Based Systems, 169: 39-52, 2019.

[20] Y.L. Huang, Z.X. Li, “A GA-based resource management algorithm for smart living applications requiring intensive computing power,” in Proc. 2017 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW): 259-260, 2017.

 [21] S.S. Abdhullah, K. Jyoti, S. Sharma, U.S. Pandey, “Review of recent load balancing techniques in cloud computing and BAT algorithm variants,” in Proc. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom): 2428-2431, 2016.

 [22] S.M. Ghafari, M. Fazeli, A. Patooghy, L. Rikhtechi, “Bee-MMT: A load balancing method for power consumption management in cloud computing,” in Proc. 2013 Sixth International Conference on Contemporary Computing (IC3): 76-80, 2013.

[23] I.P. Egwutuoha, S. Chen, D. Levy, B. Selic, R. Calvo, “Energy efficient fault tolerance for high performance computing (HPC) in the cloud,” in Proc. 2013 IEEE Sixth International Conference on Cloud Computing (CLOUD): 762-769, 2013.

[24] I.P. Egwutuoha, S. Chen, D. Levy, B. Selic, R. Calvo, “A proactive fault tolerance approach to High Performance Computing (HPC) in the cloud,” in Proc. 2012 Second International Conference on Cloud and Green Computing (CGC): 268-273, 2012.

[25] R.R. Chandrasekar, A. Venkatesh, K. Hamidouche, D.K. Panda, “Power-check: An energy-efficient check pointing framework for HPC clusters,” in Proc. 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid): 261-270, 2015.

[26] M. Yakhchi, S.M. Ghafari, S. Yakhchi, M. Fazeli, A. Patooghi, “Proposing a load balancing method based on Cuckoo Optimization Algorithm for energy management in cloud computing infrastructures,” in Proc. 2015 6th International Conference on Modeling, Simulation, and Applied Optimization (ICMSAO): 1-5, 2015.

[27] T. Tamilvzhi, B. Parvathavarthini, “A novel method for adapive fault tolerance during load balancing in cloud computing,” Cluster Computing, 22(5): 10425-10438, 2019.

[28] P. Neelima, A.R.M. Reddy, “An efficient load balancing system using adaptive dragonfly algorithm in cloud computing,” Cluster Computing, 23: 2891.2899, 2020.

[29] T.D. Devi, A. Subramani, P. Anitha, “Modified adaptive neuro fuzzy inference system based load balancing for virtual machine with security in cloud computing environment,” Journal of Ambient Intelligence and Humanized Computing, 1-8, 2020.

 [30] L. Kong, J.P.B. Mapetu, Z. Chen, “Heuristic load balancing based zero imbalance mechanism in cloud computing,” Journal of Grid Computing, 18(1): 123-148, 2020.