mirror of https://github.com/apache/cloudstack.git
kvm: reset KVM host on heartbeat failure (#2984)
On actual testing, I could see that kvmheartbeat.sh script fails on NFS server failure and stops the agent only. Any HA VMs could be launched in different hosts, and recovery of NFS server could lead to a state where a HA enabled VM runs on two hosts and can potentially cause disk corruptions. In most cases, VM disk corruption will be worse than VM downtime. I've kept the sleep interval between check/rounds but reduced it to 10s. The change in behaviour was introduced in #2722. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This commit is contained in:
parent
f7bc5807a3
commit
c6e53f6cc6
|
|
@ -35,7 +35,7 @@ public class KVMHABase {
|
|||
protected long _heartBeatUpdateTimeout = 60000;
|
||||
protected long _heartBeatUpdateFreq = 60000;
|
||||
protected long _heartBeatUpdateMaxTries = 5;
|
||||
protected long _heartBeatUpdateRetrySleep = 15000;
|
||||
protected long _heartBeatUpdateRetrySleep = 10000;
|
||||
|
||||
public static enum PoolType {
|
||||
PrimaryStorage, SecondaryStorage
|
||||
|
|
|
|||
|
|
@ -155,10 +155,10 @@ then
|
|||
exit 0
|
||||
elif [ "$cflag" == "1" ]
|
||||
then
|
||||
/usr/bin/logger -t heartbeat "kvmheartbeat.sh stopped cloudstack-agent because it was unable to write the heartbeat to the storage."
|
||||
/usr/bin/logger -t heartbeat "kvmheartbeat.sh will reboot system because it was unable to write the heartbeat to the storage."
|
||||
sync &
|
||||
sleep 5
|
||||
service cloudstack-agent stop
|
||||
echo b > /proc/sysrq-trigger
|
||||
exit $?
|
||||
else
|
||||
write_hbLog
|
||||
|
|
|
|||
Loading…
Reference in New Issue