Sometimes hypervisor hosts (direct agents) stuck with Disconnect state during agent rebalancing activity across multiple management server nodes. This issue was noticed during frequent restart of the management server nodes in the cluster.
When there are multiple management server nodes in a cluster, if one or more nodes are shutdown/start/restart, CloudStack will rebalance the hosts among the remaining nodes or move the nodes to the newly joined management server nodes. During the rebalancing period multiple operations could happen including:
- DirectAgentScan at interval of configured direct.agent.scan.interval
- AgentRebalanceScan to identify and schedule rebalance agents
- TransferAgentScan to transfer the host from original owner to future owner
**Current Rebalance behavior**
1. For hosts that have AgentAttache && not forForward but in Disconnect state, CloudStack simply ignore these hosts without trying to ping again or update the status of the host.
2. For hosts that have AgentAttache && forForward, CloudStack removes the agent but still try to loadDirectlyConnectedHost.
**Improved Rebalance behavior**
During DirectAgentScan: scanDirectAgentToLoad(), identify hosts that for self-managed hosts that are in Disconnect state (disconnected after pingtimeout).
1. For hosts that have AgentAttache and is forForward, CloudStack should remove the agent
2. For hosts that have AgentAttache and is not forForward but in Disconnect state, CloudStack should try to investigate and update the status to Up if host is pingable.
3. For hosts that don't have AgentAttache, CloudStack should try to loadDirectlyConnectedHost.
* Don't set signingRegion as auto for creating the s3 client in ceph object store provider.
* replace getBucketAcl with doesBucketExistV2 in CephObjectStoreDriverImplTest
* KVM: add Virtual TPM model and version
* KVM: add admin-only VM setting GUEST.CPU.MODE and GUEST.CPU.MODEL
* VMware: add vTPM
* vTPM: do not set Key due to 'Cannot add multiple devices using the same device key..'
* vTPM: add unit test testTpmModel
* engine/schema: remove user vm details for guest CPU mode/model
* vTPM: extra methods as Daan's requests
* vTPM: add unit tests in VmwareResourceTest
* vTPM: update unit tests in VmwareResourceTest
* vTPM: add unit test in LibvirtComputingResourceTest
* vTPM: use the default TPM version if an invalid version is passed
* vTPM: requires UEFI on vmware and do nothing if it is not enabled/disabled
* vTPM: let uses to add UEFI on vmware
* Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* vTPM: remove template details for guest CPU mode/model
* UI: boot vm from ISO into UEFI/SECURE mode
---------
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Fix unit tests due to change in behavior of restore VM
* update numbering in comments
* revert delete operations
* fix placement of start vm after refactoring
it does not work with python3
```
2025-04-18T10:43:58.5235913Z 2025-04-18 10:32:20,503 - CRITICAL - EXCEPTION: Failure:: ['Traceback (most recent call last):\n', ' File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/unittest/case.py", line 59, in testPartExecutor\n yield\n', ' File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/unittest/case.py", line 591, in run\n self._callTestMethod(testMethod)\n', ' File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/unittest/case.py", line 549, in _callTestMethod\n method()\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/failure.py", line 35, in runTest\n raise self.exc_val.with_traceback(self.tb)\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/loader.py", line 335, in loadTestsFromName\n module = self.importer.importFromPath(\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/importer.py", line 162, in importFromPath\n return self.importFromDir(dir_path, fqname)\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/importer.py", line 198, in importFromDir\n mod = load_module(part_fqname, fh, filename, desc)\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/importer.py", line 128, in load_module\n spec.loader.exec_module(mod)\n', ' File "<frozen importlib._bootstrap_external>", line 883, in exec_module\n', ' File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed\n', ' File "/home/runner/work/cloudstack/cloudstack/test/integration/smoke/test_certauthority_root.py", line 27, in <module>\n from OpenSSL.crypto import FILETYPE_PEM, verify, X509\n', "ImportError: cannot import name 'verify' from 'OpenSSL.crypto' (unknown location)\n"]
```