- Enhances the Host connecting logic to avoid connecting storm (where Agent opens multiple sockets against Management Server).
- Implements HostConnectProcess task where Host upon connection checks whether lock is available, traces Host connecting progress, status and timeout.
- Introduces AgentConnectStatusCommand, where Host checks whether lock for the Host is available (i.e. "previous" connect process is finished).
- Implementes logic to check whether Management Server has lock against Host (exposed MySQL DB lock presence via API)
- Removes synchronization on Host disconnect process, double-disconnect logic in clustered Management Server environment, added early removal from ping map (in case of combination ping timeout delay + synchronized disconnect process the Agent Manager submits more disconnect requests)
- Introduces parameterized connection and status check timeouts
- Implements backoff algorithm abstraction - can be used either constant backoff timeout or exponential with jitter to wait between connection Host attempts to Management Server
- Implements ServerAttache to be used on the Agent side of communication (similar to Attache on Management Server side)
- Enhances/Adds logs significantly to Host Agent and Agent Manager logic to trace Host connecting and disconnecting process, including ids, names, context UUIDs and timings (how much time took overall initialization/deinitialization)
- Adds logs to communication between Management Servers (PDU requests)
- Adds DB indexes to improve search performance, uses IDEMPOTENT_ADD_INDEX for safer DB schema updates
This PR aligns the use of terminology, renaming VM / virtual machine references to 'Instance' and also capitalising the terms Templates, Network, Snapshot, User, Account in CloudStack APIs, error and log messages, events, tooltips, etc. Many typos, grammar and spelling mistakes were fixed, also terms like IPv4, VPN, VPC, etc. were properly capitalised. Some error messages were cleaned for better readability. The test cases, expecting some exception strings were adjusted accordingly.
Here is the wiki page, describing the changes in details:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Object+Naming+and+Title+Case+Convention
---------
Co-authored-by: Manoj Kumar <manojkr.itbhu@gmail.com>
Co-authored-by: Harikrishna <harikrishna.patnala@gmail.com>
* Add source VM name on virt-v2v migration log entries
* Improve the feedback by displaying the running importing tasks
* Add source VM name prefix on more conversion logs
* Improve listing and also list completed tasks
* Pass extra parameters to virt-v2v if administrator allows via global setting
* Add Force converting directly to storage pool option
* Refactor based on review comments
* Add properties for env vars for the instance conversion
* Add separate component for Import VM Tasks
* applying copilot suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix importing unmanaged instances due to incorrect internal name
* Add VM prefix on each log operation for conversion
* Log the original VM name instead of the cloned VM in case of cloning
* Allow searching storage pool by UUID after conversion to support SharedMountPoint
* Fix search pools logic
* Improve UI and add checks for force convert to pool parameter
* Support Local storage when forceconverttopool is set to true
* Add config key to for allowed extra params and add validation
* Fix params lists
* Fix compile error
* Remove extra stubbings
* Fix extra params execution
---------
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
It adds a configuration called create.full.clone to the agent.properties file. When set to true, all QCOW2 volumes created will be full-clone. If false (default), the current behavior remains, where only FAT and SPARSE volumes are full-clone and THIN volumes are linked-clone.
This PR allows attaching of GPU devices via PCI, mdev or VF to an Instance for KVM.
It allows the operator to discover the GPU devices on the KVM host and create a Compute Offering with GPU support based on the available GPU devices on the host. Once the operator has created the Compute offering, it can be used by users to launch Instances with GPU devices.
* [PowerFlex/ScaleIO] Added wait time after SDC service start/restart/stop, and retries to fetch SDC id/guid
* Added agent property 'powerflex.sdc.service.wait' for the time (in secs) to wait after SDC service start/restart/stop
* code improvements
* Cumulative enhancements fix for ScaleIO: MDM add/remove, Host prepare/unprepare, validate Storage Pool can be created in Agent.
- Implemented validation to fail Host disconnect from Storage Pool if there are Volumes attached and SDC client MDM removal requires scini service to be restarted
- Implemented Storage Pool validation by checking whether MDM addresses from configuration file and from memory (using CLI) matches, otherwise file ModifyStoragePool command.
- Introduced configuration key to apply timeout after making MDM changes for ScaleIO: powerflex.mdm.change.apply.timeout.ms (default 1000ms)
- Implemented logic to apply timeout after making MDM changes for ScaleIO in prepare and unprepare logic
- Added detection of MDM removal support via CLI
- If MDM removal support via CLI supported then use CLI, fall back to edit drv_cfg.txt and restart scini instead
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
Co-authored-by: mprokopchuk <mprokopchuk@apple.com>
* Management Server - Prepare for Maintenance and Cancel Maintenance improvements:
- Added new setting 'management.server.maintenance.ignore.maintenance.hosts' to ignore hosts in maintenance states while preparing management server for maintenance. This skips agent transfer and agents count check for hosts in maintenance.
- Rebalance indirect agents after cancel maintenance, using rebalance parameter in cancelMaintenance API
- Force maintenance after maintenance window timeout, using forced parameter in prepareForMaintenance API.
- Propagate 'indirect.agent.lb.check.interval' setting change to the host agents.
* rebases fixes
* code improvements, cleanup
* [UI] Set rebalance true by default in cancel maintenance dialog
* Update MS state after executing cluster cmd in the target MS, and some code improvements
* code improvements
* Ensure the host lb algorithm 'shuffle' is applied once before disabling the indirect agent lb check background task
* KVM incremental snapshot feature
* fix log
* fix merge issues
* fix creation of folder
* fix snapshot update
* Check for hypervisor type during parent search
* fix some small bugs
* fix tests
* Address reviews
* do not remove storPool snapshots
* add support for downloading diff snaps
* Add multiple zones support
* make copied snapshots have normal names
* address reviews
* Fix in progress
* continue fix
* Fix bulk delete
* change log to trace
* Start fix on multiple secondary storages for a single zone
* Fix multiple secondary storages for a single zone
* Fix tests
* fix log
* remove bitmaps when deleting snapshots
* minor fixes
* update sql to new file
* Fix merge issues
* Create new snap chain when changing configuration
* add verification
* Fix snapshot operation selector
* fix bitmap removal
* fix chain on different storages
* address reviews
* fix small issue
* fix test
---------
Co-authored-by: João Jandre <joao@scclouds.com.br>
* cloudstack: add support for EL10
This adds support for Fedora 40 and (upcoming) EL10 distro to be used
as mgmt/usage server, mysql/nfs & KVM host. Python3 version has changed
to 3.12.9 which isn't automatically determining the python-path.
* python: WIP code, this fails right now
Need to discuss/check if we can skip this code. Where/how is cgroup
setup used with KVM agent.
* prep cloudutils to be EL10 ready
Fixes issue for Fedora, it was running old EL6 hooks which isn't
applicable for modern Fedora version that are closer to EL8/9/10