cloudstack/engine
Suresh Kumar Anaparti 95489b8bdd
Direct agents rebalance improvements with multiple management server nodes (#10674)
Sometimes hypervisor hosts (direct agents) stuck with Disconnect state during agent rebalancing activity across multiple management server nodes. This issue was noticed during frequent restart of the management server nodes in the cluster.

When there are multiple management server nodes in a cluster, if one or more nodes are shutdown/start/restart, CloudStack will rebalance the hosts among the remaining nodes or move the nodes to the newly joined management server nodes. During the rebalancing period multiple operations could happen including:

- DirectAgentScan at interval of configured direct.agent.scan.interval
- AgentRebalanceScan to identify and schedule rebalance agents
- TransferAgentScan to transfer the host from original owner to future owner

**Current Rebalance behavior**

1. For hosts that have AgentAttache && not forForward but in Disconnect state, CloudStack simply ignore these hosts without trying to ping again or update the status of the host.
2. For hosts that have AgentAttache && forForward, CloudStack removes the agent but still try to loadDirectlyConnectedHost.

**Improved Rebalance behavior**
During DirectAgentScan: scanDirectAgentToLoad(),  identify hosts that for self-managed hosts that are in Disconnect state (disconnected after pingtimeout).

1. For hosts that have AgentAttache and is forForward, CloudStack should remove the agent
2. For hosts that have AgentAttache and is not forForward but in Disconnect state, CloudStack should try to investigate and update the status to Up if host is pingable.
3. For hosts that don't have AgentAttache, CloudStack should try to loadDirectlyConnectedHost.
2025-05-13 17:47:46 +05:30
..
api api,ui: multi arch improvements (#10289) 2025-04-25 11:02:27 +02:00
components-api Add new config (non-dynamic) for agent connections monitor thread, and keep timeunit to secs (in sync with the earlier Wait config) (#10525) 2025-04-28 15:32:03 +02:00
orchestration Direct agents rebalance improvements with multiple management server nodes (#10674) 2025-05-13 17:47:46 +05:30
schema Support XenServer 8.4 / XCP 8.3 - make scripts python3 compatible (#10684) 2025-05-13 12:35:04 +02:00
service Updating pom.xml version numbers for release 4.19.3.0-SNAPSHOT 2025-02-25 10:43:11 +01:00
storage ehancement: add password to configdrive vendor_data.json (#10061) 2025-05-12 16:16:54 +02:00
userdata Updating pom.xml version numbers for release 4.19.3.0-SNAPSHOT 2025-02-25 10:43:11 +01:00
pom.xml Updating pom.xml version numbers for release 4.19.3.0-SNAPSHOT 2025-02-25 10:43:11 +01:00