Commit Graph

14 Commits

Author SHA1 Message Date
Suresh Kumar Anaparti 5caf6cd043
Merge branch '4.20' into 4.22 2026-02-19 13:19:14 +05:30
Suresh Kumar Anaparti 9dd93cef76
Support for custom SSH port for KVM hosts from the host url on add host and the configuration (#12571) 2026-02-18 20:05:51 +01:00
Suresh Kumar Anaparti be22bfe2c9
Management Server - Prepare for Maintenance and Cancel Maintenance improvements (#10995)
* Management Server - Prepare for Maintenance and Cancel Maintenance improvements:
- Added new setting 'management.server.maintenance.ignore.maintenance.hosts' to ignore hosts in maintenance states  while preparing management server for maintenance. This skips agent transfer and agents count check for hosts in maintenance.
- Rebalance indirect agents after cancel maintenance, using rebalance parameter in cancelMaintenance API
- Force maintenance after maintenance window timeout, using forced parameter in prepareForMaintenance API.
- Propagate 'indirect.agent.lb.check.interval' setting change to the host agents.

* rebases fixes

* code improvements, cleanup

* [UI] Set rebalance true by default in cancel maintenance dialog

* Update MS state after executing cluster cmd in the target MS, and some code improvements

* code improvements

* Ensure the host lb algorithm 'shuffle' is applied once before disabling the indirect agent lb check background task
2025-07-03 12:17:04 +05:30
Daan Hoogland d7d9d131b2 Merge branch '4.20' 2025-05-01 15:44:09 +02:00
Suresh Kumar Anaparti 9f229600e6
Add new config (non-dynamic) for agent connections monitor thread, and keep timeunit to secs (in sync with the earlier Wait config) (#10525) 2025-04-28 15:32:03 +02:00
Suresh Kumar Anaparti 9dceae4614
MS maintenance improvements (#10417)
* Update last agents during ms maintenance, and some code improvements

* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]

* Migrate systemvm agents before routing host agents, and some code improvements

* Added events for ms maintenance and shutdown operations

* Added the following ms maintenance and shutdown improvements

- block new agent connections during prepare for maintenance of ms

- maintain avoids ms list

- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents

- updated setup ms list and migrate agent connections to executor service

- migrate agent connection through executor, and send the answer to the ms host that initiated the migration

- re-initialize ssl handshake executor if it is shutdown

- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states

- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states

- stop agent connections monitor on ms maintenance

- update avoid ms list in ready command

- updated connected host from the client connection

- update last agents in ms metrics from the database

- updated some agent config descriptions

- update last management server in the hosts during shutdown

- added agents and lastagents in management server response

- updated management server maintenance & shutdown unit tests

- some code improvements

* refactored code / addressed comments

* removed shutdown testcase (maybe, calling System.exit)

* Revert "removed shutdown testcase (maybe, calling System.exit)"

This reverts commit e14b071715.

* avoid system.exit during shutdown test

* code improvements

* testcase fix

* Fix cutoff time in agent connections monitor thread
2025-03-19 14:18:05 +05:30
Suresh Kumar Anaparti 3b108b968f
Support for Management Server Maintenance Mode (#9854)
* Support for Management Server Maintenance

- New APIs: prepareForMaintenance and cancelMaintenance, with required parameter - managementserverid.

- New management server states for maintenance: PreparingForMaintenance, Maintenance.

- listHosts API with optional parameter – managementserverid, to list the hosts connected to the management server.

- Support management server maintenance when more than one active management servers available.

- Triggers transfer agents to other available management servers for maintenance, new agent command MigrateAgentConnectionCommand to initiate transfer of indirect agents.

- New global config 'management.server.maintenance.timeout', to set the timeout (in mins) for the management server maintenance window, default: 60 mins.

- UI changes: Prepare and Cancel Maintenance in Management Server section, Connected Agents tab, New fields for hosts and management servers.

* Updated pending jobs check timer task with ScheduledExecutorService

* keep maintenance state on trigger shutdown call when ms is in maintenance

* add pending jobs count to ms response

* during ms heartbeat, update state to up only when it's down

* allow vm work jobs of async job created before prepare for maintenance

* Revert "keep maintenance state on trigger shutdown call when ms is in maintenance"

This reverts commit 607e13364679eac897f4d146bb3325ea7a61ba17.

* skip maintenance test when multiple management servers are not available, and not configured in host setting for kvm
2025-01-29 13:31:15 +05:30
Harikrishna 9bc283e5c2
Introducing granular command timeouts global setting (#9659)
* Introducing granular command timeouts global setting

* fix marvin tests

* Fixed log messages

* some more log message fix

* Fix empty value setting

* Converted the global setting to non-dynamic

* set wait on command only when granular wait is defined. This is to keep the backward compatibility

* Improve error logging
2025-01-07 17:06:32 +05:30
Vishesh fedcf66de0
Externalise a few timeouts & fix timeout for hostSupportsUefi in libvirt ready command wrapper (#8547)
This PR fixes bug introduced in #8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release.

We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status.

We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds).

For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand.

Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below.
19250403e6/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java (L406-L442)
2024-01-27 23:36:13 +05:30
Nicolas Vazquez be66eb2a35
Auto Enable/Disable KVM hosts (#7170)
* Auto Enable Disable KVM hosts

* Improve health check result

* Fix corner cases

* Script path refactor

* Fix sonar cloud reports

* Fix last code smells

* Add marvin tests

* Fix new line on agent.properties to prevent host add failures

* Send alert on auto-enable-disable and add annotations when the setting is enabled

* Address reviews

* Add a reason for enabling or disabling a host when the automatic feature is enabled

* Fix comment on the marvin test description

* Fix for disabling the feature if the admin has manually updated the host resource state before any health check result
2023-04-04 17:03:37 +05:30
Rohit Yadav b4fdf22397
kvm: fix/optimize propogating configs (#3911)
Make some changes based on @nvazquez 's comments in PR #3491
Fix a bug in #3491
2020-03-05 12:20:51 +05:30
Wei Zhou ac7bcde45b
KVM: Propagating changes on host parameters to the agents (#3491) 2020-02-19 13:13:37 +00:00
Rafael Weingärtner 972b8b71d7
CLOUDSTACK-8855 Improve Error Message for Host Alert State and reconnect host API. (#2387)
* CLOUDSTACK-8855 Improve Error Message for Host Alert State

* [CLOUDSTACK-9846] create column to save the content of alert messages

Remove declaration of throws CloudRuntimeException
I also removed some unused variables and comments left behind

This closes #837

* Isolate a problematic test "smoke/test_certauthority_root"
2018-03-14 15:27:43 -03:00
Marc-Aurèle Brothier 893a88d225 CLOUDSTACK-10105: Use maven standard project structure in all projects (#2283)
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.

- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml

Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
2018-01-20 03:19:27 +05:30