Fault Tolerance to Balance for Messaging Layers in Communication Society
- Resource Type
- Conference
- Authors
- Mikhail, Abrosimov; Kareem, Hayder Hussein; Mahajan, Hemant
- Source
- 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA) Computing, Communication, Control and Automation (ICCUBEA), 2017 International Conference on. :1-5 Aug, 2017
- Subject
- Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Fault tolerant systems
Fault tolerance
Message passing
Standards
Hardware
Libraries
Balancing
Check pointing
Fault Tolerance
HPC
MPI
Message Passing
Communication Society
- Language
The present communication societies are based on use of High-Performance Computing (HPC) systems for balancing the messaging layers. However the HPC systems are vulnerable to different types of software and hardware failures which resulted into extra efforts to resume the working of such systems. Into the communication framework because of this type of vulnerabilities it was creating misbalancing around the layers of message passing. Hence there is need of fault tolerance methods in such HPC systems. There are different solutions proposed for efficient fault tolerance in HPC systems, but suffered from various limitations. The check pointing/restart technique is most commonly & frequently studied technique. Therefore in this research goal is to present efficient and new check pointing/restart method of fault tolerance in HPC systems for MPI (Message Passing Interface) application in order to balance the messaging layers. For fault tolerant systems checking the pointing consist as most important function, but additional overhead was the main problem of restriction for the method of fault tolerance due to the pointing check or the extra storage wanted to be both or check the pointer it is creating the issues to program added time. Hence, we wanted to develop the new enhanced application which helped to checkpoint restart technique for the applications of MPI. this check point method are supported by efficient & trusted distributed storage system, This kind of checkpoints assuring the availability of data at the time of hardware failure.