cloudstack/scripts/vm/hypervisor
James Peru d603b260c4 KVM: make storage heartbeat fence action configurable
The KVM agent's storage heartbeat scripts (kvmheartbeat.sh and
kvmspheartbeat.sh) hard-code an immediate kernel-level reboot via
'echo b > /proc/sysrq-trigger' when a heartbeat write to primary storage
times out. This bypasses all OS-level shutdown protections, drops every
running VM on the host instantly, and triggers HA cascades onto
surviving hosts.

For NFS shared storage the binary "heartbeat-write-failed = host-is-dead"
heuristic is reasonable. For LINSTOR/DRBD or other replicated local
storage, the same disk serves application I/O, replication I/O and
heartbeat I/O simultaneously - so a transient I/O contention spike can
time out the heartbeat write without the host actually being unhealthy.
The result is false-positive sysrq fencing.

Adds a new agent.properties option:

    kvm.heartbeat.fence.action = reboot | graceful-reboot
                               | restart-agent | log-only

Default value is "reboot" so existing deployments keep their current
behavior. Operators on replicated storage backends can choose a less
destructive action:

  - graceful-reboot: 'systemctl reboot' instead of sysrq, allowing VMs
    a chance to shut down cleanly
  - restart-agent: restart cloudstack-agent only, preserving running VMs
  - log-only: log + alert, no automatic action

The existing 'reboot.host.and.alert.management.on.heartbeat.timeout'
boolean continues to function as a complete Java-side bypass.

Refs: https://github.com/apache/cloudstack/issues/13089
2026-05-01 03:08:35 +03:00
..
external/provisioner extension: improve host vm power reporting (#11619) 2026-01-30 14:07:22 +05:30
kvm KVM: make storage heartbeat fence action configurable 2026-05-01 03:08:35 +03:00
ovm3 removed code in comments (#11145) 2025-12-08 16:31:48 +01:00
vmware Add Python flake8 linting for W291 trailing whitespace with Super-Linter (#4687) 2022-03-28 11:40:26 -03:00
xenserver Add support for vTPM for XenServer and XCP-ng 8.3/8.4 (#12263) 2026-01-28 13:12:32 +02:00
update_host_passwd.sh Fixes script that perform change password on hosts (#6783) 2022-12-16 17:21:44 +01:00
versions.sh interpret /etc/redhet-release better (#7570) 2023-06-09 10:29:42 +02:00