1. Remove usage of in-memory eventBus for VM lifecycle events
2. Publish event for VM, NIC and DNS record delete events into messageBus
3. Introducer subscribers for above topics
* Mark VMs in error state when expunge fails during destroy operation
* fetch volume by external id (used by external plugins)
* review comments
* Update reorder hosts log to DEBUG, log line is too verbose to have on as INFO
This PR aligns the use of terminology, renaming VM / virtual machine references to 'Instance' and also capitalising the terms Templates, Network, Snapshot, User, Account in CloudStack APIs, error and log messages, events, tooltips, etc. Many typos, grammar and spelling mistakes were fixed, also terms like IPv4, VPN, VPC, etc. were properly capitalised. Some error messages were cleaned for better readability. The test cases, expecting some exception strings were adjusted accordingly.
Here is the wiki page, describing the changes in details:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Object+Naming+and+Title+Case+Convention
---------
Co-authored-by: Manoj Kumar <manojkr.itbhu@gmail.com>
Co-authored-by: Harikrishna <harikrishna.patnala@gmail.com>
CS creates transient KVM domain.xml. When instance is unmanaged from CS, explicit dump of domain has to be taken to manage is outside of CS.
With this PR
domainXML gets backed up and becomes persistent for further management of Instance.
Stopped instance also can be unmanaged, last host for instance is considered for defining domain
hostid param is supported in unmanageVirtualMachine API for KVM hypervisor and for stopped Instances
hostid field in response of unmanageVirtualMachine, representing host used for unmanage operation
Disable unmanaging instance with config drive, can unmanage from API using forced=true param for KVM
* draas initial changes
* Added option to enable disaster recovery on a backup respository. Added UpdateBackupRepositoryCmd api.
* Added timeout for mount operation in backup restore configurable via global setting
* Addressed review comments
* fix for simulator test failures
* Added UT for coverage
* Fix create instance from backup ui for other providers
* Added events to add/update backup repository
* Fix race in fetchZones
* One more fix in fetchZones in DeployVMFromBackup.vue
* Fix zone selection in createNetwork via Create Instance from backup form.
* Allow template/iso selection in create instance from backup ui
* rename draasenabled to crosszoneinstancecreation
* Added Cross-zone instance creation in test_backup_recovery_nas.py
* Added UT in BackupManagerTest and UserVmManagerImplTest
* Integration test added for Cross-zone instance creation in test_backup_recovery_nas.py
* Remove allocated snapshots / vm snapshots on start
* Check and Cleanup snapshots / vm snapshots on MS start
* rebase fixes
* Update volume state (from Snapshotting) on MS start when its snapshot job not finished and snapshot in Creating state
This feature adds the ability to create a new instance from a VM backup for dummy, NAS and Veeam backup providers. It works even if the original instance used to create the backup was expunged or unmanaged. There are two parts to this functionality:
Saving all configuration details that the VM had at the time of taking the backup. And using them to create an instance from backup.
Enabling a user to expunge/unmanage an instance that has backups.
The Extensions Framework in Apache CloudStack is designed to provide a flexible and standardised mechanism for integrating external systems and custom workflows into CloudStack’s orchestration process. By defining structured hook points during key operations—such as virtual machine deployment, resource preparation, and lifecycle events—the framework allows administrators and developers to extend CloudStack’s behaviour without modifying its core codebase.
* Option to deploy a VM with existing volume/snapshot
* smoke test changes
check if the hypervisor is KVM
check if the primary storage's scope is ZONE wide
* skip all tests if the storage isn't Zone-Wide and the hypervisor isn't KVM
* support StorPool tags
add StorPool tags to a volume created from snapshot or to a volume which
will be attached as a ROOT to a new VM
* Add StorPool tags on the new ROOT volume
* Add the StorPool's tags when volume is created from a snapshot or a
volume is attached as a ROOT to a VM
* Addressed review
CKS Enhancements:
* Ability to specify different compute or service offerings for different types of CKS cluster nodes – worker, master or etcd
* Ability to use CKS ready custom templates for CKS cluster nodes
* Add and Remove external nodes to and from a kubernetes cluster
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* Update remove node timeout global setting
* CKS/NSX : Missing variables in worker nodes
* CKS: Fix ISO attach logic
* CKS: Fix ISO attach logic
* address comment
* Fix Port - Node mapping when cluster is scaled in the presence of external node(s)
* CKS: Externalize control and worker node setup wait time and installation attempts
* Fix logger
* Add missing headers and fix end of line on files
* CKS Mark Nodes for Manual Upgrade and Filter Nodes to add to CKS cluster from the same network
* Add support to deploy CKS cluster nodes on hosts dedicated to a domain
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
* Support unstacked ETCD
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* Fix CKS cluster scaling and minor UI improvement
* Reuse k8s cluster public IP for etcd nodes and rename etcd nodes
* Fix DNS resolver issue
* Update UDP active monitor to ICMP
* Add hypervisor type to CKS cluster creation to fix CKS cluster creation when External hosts added
* Fix build
* Fix logger
* Modify hypervisor param description in the create CKS cluster API
* CKS delete fails when external nodes are present
* CKS delete fails when external nodes are present
* address comment
* Improve network rules cleanup on failure adding external nodes to CKS cluster
* UI: Fix etcd template was not honoured
* UI: Fix etcd template was not honoured
* Refactor
* CKS: Exclude etcd nodes when calculating port numbers
* Fix network cleanup in case of CKS cluster failure
* Externalize retries and inverval for NSX segment deletion
* Fix CKS scaling when external node(s) present in the cluster
* CKS: Fix port numbers displayed against ETCD nodes
* Add node version details to every node of k8s cluster - as we now support manual upgrade
* Add node version details to every node of k8s cluster - as we now support manual upgrade
* update column name
* CKS: Exclude etcd nodes when calculating port numbers
* update param name
* update param
* UI: Fix CKS cluster creation templates listing for non admins
* CKS: Prevent etcd node start port number to coincide with k8s cluster start port numbers
* CKS: Set default kubernetes cluster node version to the kubernetes cluster version on upgrade
* CKS: Set default kubernetes cluster node version to the kubernetes cluster version on upgrade
* consolidate query
* Fix upgrade logic
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* Fix CKS cluster version upgrade
* CKS: Fix etcd port numbers being skipped
* Fix CKS cluster with etcd nodes on VPC
* Move schema and upgrade for 4.20
* Fix logger
* Fix after rebasing
* Add support for using different CNI plugins with CKS
* Add support for using different CNI plugins with CKS
* remove unused import
* Add UI support and list cni config API
* necessary UI changes
* add license
* changes to support external cni
* UI changes
* Fix NPE on restarting VPC with additional public IPs
* fix merge conflict
* add asnumber to create k8s svc layer
* support cni framework to use as-numbers
* update code
* condition to ignore undefined jinja template variables
* CKS: Do not pass AS number when network ID is passed
* Fix deletion of Userdata / CNI Configuration in projects
* CKS: Add CNI configuration details to the response and UI
* Explicit events for registering cni configuration
* Add Delete cni configuration API
* Fix CKS deployment when using VPC tiers with custom ACLs
* Fix DNS list on VR
* CKS: Use Network offering of the network passed during CKS cluster creation to get the AS number
* CKS cluster with guest IP
* Fix: Use control node guest IP as join IP for external nodes addition
* Fix DNS resolver issue
* Improve etcd indexing - start from 1
* CKS: Add external node to a CKS cluster deployed with etcd node(s) successfully
* CKS: Add external node to a CKS cluster deployed with etcd node(s) successfully
* simplify logic
* Tweak setup-kube-system script for baremetal external nodes
* Consider cordoned nodes while getting ready nodes
* Fix CKS cluster scale calculations
* Set token TTL to 0 (no expire) for external etcd
* Fix missing quotes
* Fix build
* Revert PR 9133
* Add calico commands for ens35 interface
* Address review comments: plan CKS cluster deployment based on the node type
* Add qemu-guest-agent dependency for kvm based templates
* Add marvin test for CKS clusters with different offerings per node type
* Remove test tag
* Add marvin test and fix update template for cks and since annotations
* Fix marvin test for adding and removing external nodes
* Fix since version on API params
* Address review comments
* Fix unit test
* Address review comments
* UI: Make CKS public templates visible to non-admins on CKS cluster creation
* Fix linter
* Fix merge error
* Fix positional parameters on the create kubernetes ISO script and make the ETCD version optional
* fix etcd port displayed
* Further improvements to CKS (#118)
* Multiple nics support on Ubuntu template
* Multiple nics support on Ubuntu template
* supports allocating IP to the nic when VM is added to another network - no delay
* Add option to select DNS or VR IP as resolver on VPC creation
* Add API param and UI to select option
* Add column on vpc and pass the value on the databags for CsDhcp.py to fix accordingly
* Externalize the CKS Configuration, so that end users can tweak the configuration before deploying the cluster
* Add new directory to c8 packaging for CKS config
* Remove k8s configuration from resources and make it configurable
* Revert "Remove k8s configuration from resources and make it configurable"
This reverts commit d5997033ebe4ba559e6478a64578b894f8e7d3db.
* copy conf to mgmt server and consume them from there
* Remove node from cluster
* Add missing /opt/bin directory requrired by external nodes
* Login to a specific Project view
* add indents
* Fix CKS HA clusters
* Fix build
---------
Co-authored-by: Nicolas Vazquez <nicovazquez90@gmail.com>
* Add missing headers
* Fix linter
* Address more review comments
* Fix unit test
* Fix scaling case for the same offering
* Revert "Login to a specific Project view"
This reverts commit 95e37563f4.
* Revert "Fix CKS HA clusters" (#120)
This reverts commit 8dac16aa35.
* Apply suggestions from code review about user data
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Update api/src/main/java/org/apache/cloudstack/api/command/user/userdata/BaseRegisterUserDataCmd.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Refactor column names and schema path
* Fix scaling for non existing previous offering per node type
* Update node offering entry if there was an existing offering but a global service offering has been provided on scale
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
Co-authored-by: Daan Hoogland <daan@onecht.net>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* FR-248: Instance lease, WIP commit
* insert lease expiry into db and use that to filter exiring vms, add asyncjobmanager
* Add leaseDuration and leaseExpiryAction in Service offering create flow
* Update listVM cmd to allow listing only leased instances
* Add methods to fetch instances for which lease is expiring in next days
* Changes included:
config key setup and configured for alert email
lease options in create and update vm screen
handle delete protection, edit vm, create vm
validated stop and detroy, delete protection
* Update UI screens for leased properties coming from config and service offering
* use global lock before running scheduler
* Unit tests
* Flow changes done in UI based on discussion
* Include view changes in schema upgrade files and use feature in various UI elements
* Added integration test for vm deployment, UI enhancements for user persona, bug fixes
* validate integration tests, minor ui changes and log messages
* fix build: moving configkey from setup to test itself
* Disable testAlert to unblock build and trim whitespaces in integration tests
* Address review comments
* Minor changes in EditVM screen
* Use ExecutorService instead of Timer and TimerTask
* Additional review comments
* Incorporate following changes:
1. Execute lease action once on the instance
2. Cancel lease on instance when feature is disabled
3. Relevant events when lease gets disabled, cancelled, executed
4. Disable associating lease after deployment
5. UI elements and flow changes
6. Changes based on feedback from demo
* Handle pr review comments
* address review comments
* move instance.lease.enabled config to VMLeaseManager interface
* bug fix in edit instance flow and reject api request for invalid values
* max allowed lease is for 100 years
* log instance ids for expired instance
* Fix config validation for value range and code coverage improvement
* fix lease expiry request failures in async
* dont use forced: true for StopVmCmd
* Update server/src/main/java/org/apache/cloudstack/vm/lease/VMLeaseManager.java
Co-authored-by: Vishesh <vishesh92@gmail.com>
* handle review comments
---------
Co-authored-by: Rohit Yadav <rohityadav89@gmail.com>
Co-authored-by: Vishesh <vishesh92@gmail.com>
* KVM: add Virtual TPM model and version
* KVM: add admin-only VM setting GUEST.CPU.MODE and GUEST.CPU.MODEL
* VMware: add vTPM
* vTPM: do not set Key due to 'Cannot add multiple devices using the same device key..'
* vTPM: add unit test testTpmModel
* engine/schema: remove user vm details for guest CPU mode/model
* vTPM: extra methods as Daan's requests
* vTPM: add unit tests in VmwareResourceTest
* vTPM: update unit tests in VmwareResourceTest
* vTPM: add unit test in LibvirtComputingResourceTest
* vTPM: use the default TPM version if an invalid version is passed
* vTPM: requires UEFI on vmware and do nothing if it is not enabled/disabled
* vTPM: let uses to add UEFI on vmware
* Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* vTPM: remove template details for guest CPU mode/model
* UI: boot vm from ISO into UEFI/SECURE mode
---------
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
Co-authored-by: Stephan Krug <stephan.krug@scclouds.com.br>
Co-authored-by: Gabriel <gabriel.fernandes@scclouds.com.br>
Co-authored-by: Fabricio Duarte <fabricio.duarte.jr@gmail.com>
* Improve logging to include more identifiable information for kvm plugin
* Update logging for scaleio plugin
* Improve logging to include more identifiable information for default volume storage plugin
* Improve logging to include more identifiable information for agent managers
* Improve logging to include more identifiable information for Listeners
* Replace ids with objects or uuids
* Improve logging to include more identifiable information for engine
* Improve logging to include more identifiable information for server
* Fixups in engine
* Improve logging to include more identifiable information for plugins
* Improve logging to include more identifiable information for Cmd classes
* Fix toString method for StorageFilterTO.java
This is a simple NAS backup plugin for KVM which may be later expanded for other hypervisors. This backup plugin aims to use shared NAS storage on KVM hosts such as NFS (or CephFS and others in future), which is used to backup fully cloned VMs for backup & restore operations. This may NOT be as efficient and performant as some of the other B&R providers, but maybe useful for some KVM environments who are okay to only have full-instance backups and limited functionality.
Design & Implementation follows the `networker` B&R plugin, which is simply:
- Implement B&R plugin interfaces
- Use cmd-answer pattern to execute backup and restore operations on KVM host when VM is running (or needs to be restored) - instead of a B&R API client, relies on answers from KVM agent which executes the operations
- Backups are full VM domain snapshots, copied to a VM-specific folders on a NAS target (NFS) along with a domain XML
- Backup uses libvirt feature: https://libvirt.org/kbase/live_full_disk_backup.html orchestrated via virsh/bash script (nasbackup.sh) as the libvirt-java lacks the bindings
- Supported instance volume storage for restore operations: NFS & local storage
Refer the doc PR for feature limitations and usage details:
https://github.com/apache/cloudstack-documentation/pull/429
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
During live migration of a VM from between hosts having different cgroup versions (cgroupv2 & cgroup), overcommit ratio is ignored.
This PR fixes the above issue.
* Add ability to set cpu.threadspercore similar to existing cpu.corespersocket
* add cpu.threadspercore to VM and template detail options
* Update plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* add vm detail for KVM
---------
Co-authored-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Allow overriding root diskoffering id & size while restoring VM
* UI changes
* Allow expunging of old disk while restoring a VM
* Resolve comments
* Address comments
* Duplicate volume's details while duplicating volume
* Allow setting IOPS for the new volume
* minor cleanup
* fixup
* Add checks for template size
* Replace strings for IOPS with constants
* Fix saveVolumeDetails method
* Fixup
* Fixup UI styling
This PR adds the capability in CloudStack to convert VMware Instances disk(s) to KVM using virt-v2v and import them as CloudStack instances. It enables CloudStack operators to import VMware instances from vSphere into a KVM cluster managed by CloudStack. vSphere/VMware setup might be managed by CloudStack or be a standalone setup.
CloudStack will let the administrator select a VM from an existing VMware vCenter in the CloudStack environment or external vCenter requesting vCenter IP, Datacenter name and credentials.
The migrated VM will be imported as a KVM instance
The migration is done through virt-v2v: https://access.redhat.com/articles/1351473, https://www.ovirt.org/develop/release-management/features/virt/virt-v2v-integration.html
The migration process timeout can be set by the setting convert.instance.process.timeout
Before attempting the virt-v2v migration, CloudStack will create a clone of the source VM on VMware. The clone VM will be removed after the registration process finishes.
CloudStack will delegate the migration action to a KVM host and the host will attempt to migrate the VM invoking virt-v2v. In case the guest OS is not supported then CloudStack will handle the error operation as a failure
The migration process using virt-v2v may not be a fast process
CloudStack will not perform any check about the guest OS compatibility for the virt-v2v library as indicated on: https://access.redhat.com/articles/1351473.
This pull request (PR) implements a Distributed Resource Scheduler (DRS) for a CloudStack cluster. The primary objective of this feature is to enable automatic resource optimization and workload balancing within the cluster by live migrating the VMs as per configuration.
Administrators can also execute DRS manually for a cluster, using the UI or the API.
Adds support for two algorithms - condensed & balanced. Algorithms are pluggable allowing ACS Administrators to have customized control over scheduling.
Implementation
There are three top level components:
Scheduler
A timer task which:
Generate DRS plan for clusters
Process DRS plan
Remove old DRS plan records
DRS Execution
We go through each VM in the cluster and use the specified algorithm to check if DRS is required and to calculate cost, benefit & improvement of migrating that VM to another host in the cluster. On the basis of cost, benefit & improvement, the best migration is selected for the current iteration and the VM is migrated. The maximum number of iterations (live migrations) possible on the cluster is defined by drs.iterations which is defined as a percentage (as a value between 0 and 1) of total number of workloads.
Algorithm
Every algorithms implements two methods:
needsDrs - to check if drs is required for cluster
getMetrics - to calculate cost, benefit & improvement of a migrating a VM to another host.
Algorithms
Condensed - Packs all the VMs on minimum number of hosts in the cluster.
Balanced - Distributes the VMs evenly across hosts in the cluster.
Algorithms use drs.level to decide the amount of imbalance to allow in the cluster.
APIs Added
listClusterDrsPlan
id - ID of the DRS plan to list
clusterid - to list plans for a cluster id
generateClusterDrsPlan
id - cluster id
iterations - The maximum number of iterations in a DRS job defined as a percentage (as a value between 0 and 1) of total number of workloads. Defaults to value of cluster's drs.iterations setting.
executeClusterDrsPlan
id - ID of the cluster for which DRS plan is to be executed.
migrateto - This parameter specifies the mapping between a vm and a host to migrate that VM. Format of this parameter: migrateto[vm-index].vm=<uuid>&migrateto[vm-index].host=<uuid>.
Config Keys Added
ClusterDrsPlanExpireInterval
Key drs.plan.expire.interval
Scope Global
Default Value 30 days
Description The interval in days after which old DRS records will be cleaned up.
ClusterDrsEnabled
Key drs.automatic.enable
Scope Cluster
Default Value false
Description Enable/disable automatic DRS on a cluster.
ClusterDrsInterval
Key drs.automatic.interval
Scope Cluster
Default Value 60 minutes
Description The interval in minutes after which a periodic background thread will schedule DRS for a cluster.
ClusterDrsIterations
Key drs.max.migrations
Scope Cluster
Default Value 50
Description Maximum number of live migrations in a DRS execution.
ClusterDrsAlgorithm
Key drs.algorithm
Scope Cluster
Default Value condensed
Description DRS algorithm to execute on the cluster. This PR implements two algorithms - balanced & condensed.
ClusterDrsLevel
Key drs.imbalance
Scope Cluster
Default Value 0.5
Description Percentage (as a value between 0.0 and 1.0) of imbalance allowed in the cluster. 1.0 means no imbalance
is allowed and 0.0 means imbalance is allowed.
ClusterDrsMetric
Key drs.imbalance.metric
Scope Cluster
Default Value memory
Description The cluster imbalance metric to use when checking the drs.imbalance.threshold. Possible values are memory and cpu.
This PR adds two vm setting for user vms on KVM
- nic multiqueue number
- packed virtqueues enabled . optional are true and false (false by default). It requires qemu>=4.2.0 and libvirt >=6.3.0
Tested ok on ubuntu 22 and rocky 8.4