* FR-248: Instance lease, WIP commit
* insert lease expiry into db and use that to filter exiring vms, add asyncjobmanager
* Add leaseDuration and leaseExpiryAction in Service offering create flow
* Update listVM cmd to allow listing only leased instances
* Add methods to fetch instances for which lease is expiring in next days
* Changes included:
config key setup and configured for alert email
lease options in create and update vm screen
handle delete protection, edit vm, create vm
validated stop and detroy, delete protection
* Update UI screens for leased properties coming from config and service offering
* use global lock before running scheduler
* Unit tests
* Flow changes done in UI based on discussion
* Include view changes in schema upgrade files and use feature in various UI elements
* Added integration test for vm deployment, UI enhancements for user persona, bug fixes
* validate integration tests, minor ui changes and log messages
* fix build: moving configkey from setup to test itself
* Disable testAlert to unblock build and trim whitespaces in integration tests
* Address review comments
* Minor changes in EditVM screen
* Use ExecutorService instead of Timer and TimerTask
* Additional review comments
* Incorporate following changes:
1. Execute lease action once on the instance
2. Cancel lease on instance when feature is disabled
3. Relevant events when lease gets disabled, cancelled, executed
4. Disable associating lease after deployment
5. UI elements and flow changes
6. Changes based on feedback from demo
* Handle pr review comments
* address review comments
* move instance.lease.enabled config to VMLeaseManager interface
* bug fix in edit instance flow and reject api request for invalid values
* max allowed lease is for 100 years
* log instance ids for expired instance
* Fix config validation for value range and code coverage improvement
* fix lease expiry request failures in async
* dont use forced: true for StopVmCmd
* Update server/src/main/java/org/apache/cloudstack/vm/lease/VMLeaseManager.java
Co-authored-by: Vishesh <vishesh92@gmail.com>
* handle review comments
---------
Co-authored-by: Rohit Yadav <rohityadav89@gmail.com>
Co-authored-by: Vishesh <vishesh92@gmail.com>
* VMware - Ignore disk not found error on cleanup when the VM disk doesn't exists
* VMware - Retry powerOn on lock issues
* addressed comments
* Update CPVM reboot tests - wait for the agent to Disconnect and back Up
* Retry moveDatastoreFile when any file access issue while creating volume from snapshot
* Update full clone flag when restoring vm using root disk offering with more size than the template size
* refactored (mainly,for diskInfo - causing NPE in some cases)
* Retry moveDatastoreFile when there is any file access issue
* Fix unit tests due to change in behavior of restore VM
* update numbering in comments
* revert delete operations
* fix placement of start vm after refactoring
it does not work with python3
```
2025-04-18T10:43:58.5235913Z 2025-04-18 10:32:20,503 - CRITICAL - EXCEPTION: Failure:: ['Traceback (most recent call last):\n', ' File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/unittest/case.py", line 59, in testPartExecutor\n yield\n', ' File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/unittest/case.py", line 591, in run\n self._callTestMethod(testMethod)\n', ' File "/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/unittest/case.py", line 549, in _callTestMethod\n method()\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/failure.py", line 35, in runTest\n raise self.exc_val.with_traceback(self.tb)\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/loader.py", line 335, in loadTestsFromName\n module = self.importer.importFromPath(\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/importer.py", line 162, in importFromPath\n return self.importFromDir(dir_path, fqname)\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/importer.py", line 198, in importFromDir\n mod = load_module(part_fqname, fh, filename, desc)\n', ' File "/home/runner/.local/lib/python3.10/site-packages/nose/importer.py", line 128, in load_module\n spec.loader.exec_module(mod)\n', ' File "<frozen importlib._bootstrap_external>", line 883, in exec_module\n', ' File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed\n', ' File "/home/runner/work/cloudstack/cloudstack/test/integration/smoke/test_certauthority_root.py", line 27, in <module>\n from OpenSSL.crypto import FILETYPE_PEM, verify, X509\n', "ImportError: cannot import name 'verify' from 'OpenSSL.crypto' (unknown location)\n"]
```
make tests easier to run and without modifying the test to match the
correct system. Also add tests for volume and vm instance snapshots.
Improve runtime by ~6 minutes.
* api,agent,server,engine-schema: scalability improvements
Following changes and improvements have been added:
- Improvements in handling of PingRoutingCommand
1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
3. Optimized scanning stalled VMs
- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`
- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine
- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.
- Added caching for account/use role API access with expiration after write set to 60 seconds.
- Added caching for some recurring DB retrievals
1. CapacityManager - listing service offerings - beneficial in host capacity calculation
2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
3. DownloadListener - hypervisors for zone - beneficial for host joins
5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands
- Optimized MS list retrieval for agent connect
- Optimize finding ready systemvm template for zone
- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks
- Changes in agent-agentmanager connection with NIO client-server classes
1. Optimized the use of the executor service
2. Refactore Agent class to better handle connections.
3. Do SSL handshakes within worker threads
5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.
- Improvements in StatsCollection - minimize DB retrievals.
- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.
- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.
- Minor improvements in resource limit calculations wrt DB retrievals
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* test1, domaindetails, capacitymanager fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* test2 - agent tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* capacitymanagertest fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* change
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix missing changes
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert marvin/setup.py
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix indent
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* use space in sql
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address duplicate
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* update host logs
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert e36c6a5d07
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix npe in capacity calculation
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* move schema changes to 4.20.1 upgrade
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix build
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add some more tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* checkstyle fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* remove unnecessary mocks
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* replace statics
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* engine/orchestration,utils: limit number of concurrent new agent
connections
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* refactor - remove unused
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* unregister closed connections, monitor & cleanup
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add check for outdated vm filter in power sync
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* agent: synchronize sendRequest wait
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Support for Management Server Maintenance
- New APIs: prepareForMaintenance and cancelMaintenance, with required parameter - managementserverid.
- New management server states for maintenance: PreparingForMaintenance, Maintenance.
- listHosts API with optional parameter – managementserverid, to list the hosts connected to the management server.
- Support management server maintenance when more than one active management servers available.
- Triggers transfer agents to other available management servers for maintenance, new agent command MigrateAgentConnectionCommand to initiate transfer of indirect agents.
- New global config 'management.server.maintenance.timeout', to set the timeout (in mins) for the management server maintenance window, default: 60 mins.
- UI changes: Prepare and Cancel Maintenance in Management Server section, Connected Agents tab, New fields for hosts and management servers.
* Updated pending jobs check timer task with ScheduledExecutorService
* keep maintenance state on trigger shutdown call when ms is in maintenance
* add pending jobs count to ms response
* during ms heartbeat, update state to up only when it's down
* allow vm work jobs of async job created before prepare for maintenance
* Revert "keep maintenance state on trigger shutdown call when ms is in maintenance"
This reverts commit 607e13364679eac897f4d146bb3325ea7a61ba17.
* skip maintenance test when multiple management servers are not available, and not configured in host setting for kvm
* 4.20:
merge errors fixed
Restrict the migration of volumes attached to VMs in Starting state (#9725)
server, plugin: enhance storage stats for IOPS (#10034)
Introducing granular command timeouts global setting (#9659)
Improve logging to include more identifiable information (#9873)
* Introducing granular command timeouts global setting
* fix marvin tests
* Fixed log messages
* some more log message fix
* Fix empty value setting
* Converted the global setting to non-dynamic
* set wait on command only when granular wait is defined. This is to keep the backward compatibility
* Improve error logging