kvm: reset KVM host on heartbeat failure (#2984)

On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This commit is contained in:
Rohit Yadav 2018-10-30 15:13:59 +05:30 committed by GitHub
parent f7bc5807a3
commit c6e53f6cc6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 3 additions and 3 deletions

View File

@ -35,7 +35,7 @@ public class KVMHABase {
protected long _heartBeatUpdateTimeout = 60000;
protected long _heartBeatUpdateFreq = 60000;
protected long _heartBeatUpdateMaxTries = 5;
protected long _heartBeatUpdateRetrySleep = 15000;
protected long _heartBeatUpdateRetrySleep = 10000;
public static enum PoolType {
PrimaryStorage, SecondaryStorage

View File

@ -155,10 +155,10 @@ then
exit 0
elif [ "$cflag" == "1" ]
then
/usr/bin/logger -t heartbeat "kvmheartbeat.sh stopped cloudstack-agent because it was unable to write the heartbeat to the storage."
/usr/bin/logger -t heartbeat "kvmheartbeat.sh will reboot system because it was unable to write the heartbeat to the storage."
sync &
sleep 5
service cloudstack-agent stop
echo b > /proc/sysrq-trigger
exit $?
else
write_hbLog