cloudstack/scripts
James Peru d603b260c4 KVM: make storage heartbeat fence action configurable
The KVM agent's storage heartbeat scripts (kvmheartbeat.sh and
kvmspheartbeat.sh) hard-code an immediate kernel-level reboot via
'echo b > /proc/sysrq-trigger' when a heartbeat write to primary storage
times out. This bypasses all OS-level shutdown protections, drops every
running VM on the host instantly, and triggers HA cascades onto
surviving hosts.

For NFS shared storage the binary "heartbeat-write-failed = host-is-dead"
heuristic is reasonable. For LINSTOR/DRBD or other replicated local
storage, the same disk serves application I/O, replication I/O and
heartbeat I/O simultaneously - so a transient I/O contention spike can
time out the heartbeat write without the host actually being unhealthy.
The result is false-positive sysrq fencing.

Adds a new agent.properties option:

    kvm.heartbeat.fence.action = reboot | graceful-reboot
                               | restart-agent | log-only

Default value is "reboot" so existing deployments keep their current
behavior. Operators on replicated storage backends can choose a less
destructive action:

  - graceful-reboot: 'systemctl reboot' instead of sysrq, allowing VMs
    a chance to shut down cleanly
  - restart-agent: restart cloudstack-agent only, preserving running VMs
  - log-only: log + alert, no automatic action

The existing 'reboot.host.and.alert.management.on.heartbeat.timeout'
boolean continues to function as a complete Java-side bypass.

Refs: https://github.com/apache/cloudstack/issues/13089
2026-05-01 03:08:35 +03:00
..
installer removed code in comments (#11145) 2025-12-08 16:31:48 +01:00
network pre-commit: add `XML` files to the `trailing-whitespace` check (#9131) 2024-07-12 09:42:54 +02:00
storage Merge release branch 4.22 to main 2026-01-26 13:32:56 +01:00
util Replace deprecated 'egrep' commands with 'grep -E'. (#12306) 2025-12-22 14:27:41 +01:00
vm KVM: make storage heartbeat fence action configurable 2026-05-01 03:08:35 +03:00