Document Type : Original Research Paper

Authors

Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran

Abstract

Background and Objectives: Cloud Computing has brought a new dimension to the IT world. The technology of cloud computing allows employing a large number of Virtual Machines to run intensive applications. Each failure in running applications fails system operations. To solve the problem, it is required to restart the systems.
Methods: In this paper, to predict and avoid failure in HPC systems, a method of fault tolerance to High-Performance Computing systems (HPC) in the cloud is called Daemon-COA-MMT (DCM), has been proposed. In the proposed method, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing. The method consists of four modules, which are used to determine the host state. When the system is in the alarm state, the current host may face failure. Then the most optimal host for migration is selected, and process-level migration is performed. The method causes decreased migration overheads, decreased system performance speed, optimal use of underutilized hosts instead of leasing new hosts, appropriate load balancing, equal use of hardware resources by all hosts, focusing on QoS and SLA, and the significant decrease of energy consumption.
Results: The simulation results revealed that in terms of parameters, the proposed method declines average job makespan, average response time, and average task execution cost by 18.06%, 35.68%, and 24.6%, respectively. The proposed fault tolerance algorithm has improved energy consumption by 30% and decreased the HPC systems' failure rate.
Conclusion: In this study, the Daemon Fault Tolerance technique has been enhanced, and COA-MMT has been utilized for load balancing in high performance computing in the cloud computing.


======================================================================================================
Copyrights
©2020 The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.
======================================================================================================

Keywords

Main Subjects

[1] M. Vaishnnave, K.S. Devi, P. Srinivasan, “A survey on cloud computing and hybrid cloud,” Int. J. Appl. Eng. Res., 14: 429-434, 2019.

[2] M.U. Bokhari, Q. Makki, Y.K. Tamandani, “A survey on cloud computing,” Big Data Analytics: 149-164, 2018.

[3] F.A. Ibrahim, E.E. Hemayed, “Trusted cloud computing architectures for infrastructure as a service: Survey and systematic literature review,” Computers & Security, 82: 196-226, 2019.

[4] A.M. Caulfield, E.S. Chung, A. Putnam, H. Angepat, D. Firestone, J. Fowers, et al., “Configurable clouds,” IEEE Micro, 37(3): 52-61, 2017.

[5] F. Zafar, A. Khan, S.U.R. Malik, M. Ahmed, A. Anjum, M.I. Khan, et al., “A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends,” Computers & Security: 65, 29-49, 2017.

[6] K. O'brien, I. Pietri, R. Reddy, A. Lastovetsky, R. Sakellariou, “A survey of power and energy predictive models in HPC systems and applications,” ACM Computing Surveys (CSUR), 50(3):1-38, 2017.

[7] M.A. Netto, R.N. Calheiros, E.R. Rodrigues, R.L. Cunha, R. Buyya, “HPC cloud for scientific and business applications: taxonomy, vision, and research challenges,” ACM Computing Surveys (CSUR), 51(1): 1-29, 2018.

[8] A. Pradhan, S.K. Bisoy, P.K. Mallick, “Load Balancing in Cloud Computing: Survey,” Innovation in Electrical Power Engineering, Communication, and Computing Technology: 99-111, 2020.

[9] M.R. Mesbahi, A.M. Rahmani, M. Hosseinzadeh, “Reliability and high availability in cloud computing environments: a reference roadmap,” Human-centric Computing and Information Sciences, 8(1): 20, 2018.

[10] M.N. Cheraghlou, A. Khadem-Zadeh, M. Haghparast, “A survey of fault tolerance architecture in cloud computing,”Journal of Network and Computer Applications, 61: 81-92, 2016.

[11] A. Rezaeipanah, M. Mojarad, A. Fakhari, “Providing a new approach to increase fault tolerance in cloud computing using fuzzy logic,” International Journal of Computers and Applications: 1-9, 2000.

[12] Q. Lin, K. Hsieh, Y. Dang, H. Zhang, K. Sui, Y. Xu, et al., “Predicting Node failure in cloud service systems. in Proc. the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering: 480-490, 2018.

[13] A.A. Shaikh, S. Ahmad, “Fault tolerance management for cloud environment: a critical review,” International Journal of Advanced Research in Computer Science, 9(Special Issue 2): 34, 2018.

[14] A. Hota, S. Mohapatra, S. Mohanty, “Survey of different load balancing approach-based algorithms in cloud computing: a comprehensive review,” Computational intelligence in data mining: 99-110, 2019.

[15] P. Kumar, R. Kumar, “Issues and challenges of load balancing techniques in cloud computing: A survey,” ACM Computing Surveys (CSUR), 51(6): 1-35, 2019.

 [16] M. Kumar, S.C. Sharma, “Dynamic load balancing algorithm to minimize the makespan time and utilize the resources effectively in cloud environment,” International Journal of Computers and Applications, 42(1), 108-117, 2020.

[17] K. Pan, J. Chen, “Load balancing in cloud computing environment based on an improved particle swarm optimization,” in Proc. 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS): 595-598, 2015.

[18] F. Abazari, M. Analoui, H.  Takabi, S. Fu, “MOWS: multi-objective workflow scheduling in cloud computing based on heuristic algorithm,” Simulation Modelling Practice and Theory, 93: 119-132, 2019.

 [19] M. Abd Elaziz, S. Xiong, K.P.N.  Jayasena, L. Li, “Task scheduling in cloud computing based on hybrid moth search algorithm and differential evolution,” Knowledge-Based Systems, 169: 39-52, 2019.

[20] Y.L. Huang, Z.X. Li, “A GA-based resource management algorithm for smart living applications requiring intensive computing power,” in Proc. 2017 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW): 259-260, 2017.

 [21] S.S. Abdhullah, K. Jyoti, S. Sharma, U.S. Pandey, “Review of recent load balancing techniques in cloud computing and BAT algorithm variants,” in Proc. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom): 2428-2431, 2016.

 [22] S.M. Ghafari, M. Fazeli, A. Patooghy, L. Rikhtechi, “Bee-MMT: A load balancing method for power consumption management in cloud computing,” in Proc. 2013 Sixth International Conference on Contemporary Computing (IC3): 76-80, 2013.

[23] I.P. Egwutuoha, S. Chen, D. Levy, B. Selic, R. Calvo, “Energy efficient fault tolerance for high performance computing (HPC) in the cloud,” in Proc. 2013 IEEE Sixth International Conference on Cloud Computing (CLOUD): 762-769, 2013.

[24] I.P. Egwutuoha, S. Chen, D. Levy, B. Selic, R. Calvo, “A proactive fault tolerance approach to High Performance Computing (HPC) in the cloud,” in Proc. 2012 Second International Conference on Cloud and Green Computing (CGC): 268-273, 2012.

[25] R.R. Chandrasekar, A. Venkatesh, K. Hamidouche, D.K. Panda, “Power-check: An energy-efficient check pointing framework for HPC clusters,” in Proc. 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid): 261-270, 2015.

[26] M. Yakhchi, S.M. Ghafari, S. Yakhchi, M. Fazeli, A. Patooghi, “Proposing a load balancing method based on Cuckoo Optimization Algorithm for energy management in cloud computing infrastructures,” in Proc. 2015 6th International Conference on Modeling, Simulation, and Applied Optimization (ICMSAO): 1-5, 2015.

[27] T. Tamilvzhi, B. Parvathavarthini, “A novel method for adapive fault tolerance during load balancing in cloud computing,” Cluster Computing, 22(5): 10425-10438, 2019.

[28] P. Neelima, A.R.M. Reddy, “An efficient load balancing system using adaptive dragonfly algorithm in cloud computing,” Cluster Computing, 23: 2891.2899, 2020.

[29] T.D. Devi, A. Subramani, P. Anitha, “Modified adaptive neuro fuzzy inference system based load balancing for virtual machine with security in cloud computing environment,” Journal of Ambient Intelligence and Humanized Computing, 1-8, 2020.

 [30] L. Kong, J.P.B. Mapetu, Z. Chen, “Heuristic load balancing based zero imbalance mechanism in cloud computing,” Journal of Grid Computing, 18(1): 123-148, 2020.


LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.


[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

CAPTCHA Image