一种高可靠分布计算系统的适应性故障侦测方法

An Adaptable Failure Detection Method for High Reliable Distributed Computing Systems

  • 摘要: 针对高可靠分布计算系统的故障侦测,提出一种适应性故障侦测方法.根据系统计算节点的负载和网络传输时延,动态地估算心跳消息超时时限,协商改变心跳消息的发送周期,以适应系统状态的变化,减少故障侦测服务的错误.模拟实验表明,该方法与通常的故障侦测方法及NFD-E相比,故障侦测出错次数较少,侦测时间较短,并能够适应高可靠分布计算系统状况的变化,在侦测的实时性和正确性上提供较好的平衡.

     

    Abstract: An adaptable heartbeat failure detection method is proposed for high reliable distributed computing system.It dynamically estimates the heartbeat detection timeout and changes the heartbeat sending interval according to the processor load and transmission delay of the system.It adapts to the change of the system state so as to reduce false detections.Simulation results show that the failure detector has less false detections and shorter detection time compared with normal failure detector and NFD-E.It can adapt the change of the state of high reliable distributed computing system,and achieve a compromise between a good detection time and the need of avoiding false detections.

     

/

返回文章
返回
Baidu
map