* New feature: Change storage pool scope
* Added checks for Ceph/RBD
* Update op_host_capacity table on primary storage scope change
* Storage pool scope change integration test
* pull 8875 : Addressed review comments
* Pull 8875: remove storage checks, AbstractPrimayStorageLifeCycleImpl class
* Pull 8875: Fixed integration test failure
* Pull 8875: Review comments
* Pull 8875: review comments + broke changeStoragePoolScope into smaller functions
* Added UT for changeStoragePoolScope
* Rename AbstractPrimaryDataStoreLifeCycleImpl to BasePrimaryDataStoreLifeCycleImpl
* Pull 8875: Dao review comments
* Pull 8875: Rename changeStoragePoolScope.vue to ChangeStoragePoolScope.vue
* Pull 8875: Created a new smokes test file + A single warning msg in ui
* Pull 8875: Added cleanup in test_primary_storage_scope.py
* Pull 8875: Type in en.json
* Pull 8875: cleanup array in test_primary_storage_scope.py
* Pull:8875 Removing extra whitespace at eof of StorageManagerImplTest
* Pull 8875: Added UT for PrimaryDataStoreHelper and BasePrimaryDataStoreLifeCycleImpl
* Pull 8875: Added license header
* Pull 8875: Fixed sql query for vmstates
* Pull 8875: Changed icon plus info on disabled mode in apidoc
* Pull 8875: Change scope should not work for local storage
* Pull 8875: Change scope completion event
* Pull 8875: Added api findAffectedVmsForStorageScopeChange
* Pull 8875: Added UT for findAffectedVmsForStorageScopeChange and removed listByPoolIdVMStatesNotInCluster
* Pull 8875: Review comments + Vm name in response
* Pull 8875: listByVmsNotInClusterUsingPool was returning duplicate VM entries because of multiple volumes in the VM satisfying the criteria
* Pull 8875: fixed listAffectedVmsForStorageScopeChange UT
* listAffectedVmsForStorageScopeChange should work if the pool is not disabled
* Fix listAffectedVmsForStorageScopeChangeTest UT
* Pull 8875: add volume.removed not null check in VmsNotInClusterUsingPool query
* Pull 8875: minor refactoring in changeStoragePoolScopeToCluster
* Update server/src/main/java/com/cloud/storage/StorageManagerImpl.java
* fix eof
* changeStoragePoolScopeToZone should connect pool to all Up hosts
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Mitigation for non-scalable Powerflex/ScaleIO clients
- Added ScaleIOSDCManager to manage SDC connections, checks clients limit, prepare and unprepare SDC on the hosts.
- Added commands for prepare and unprepare storage clients to prepare/start and stop SDC service respectively on the hosts.
- Introduced config 'storage.pool.connected.clients.limit' at storage level for client limits, currently support for Powerflex only.
* tests issue fixed
* refactor / improvements
* lock with powerflex systemid while checking connections limit
* updated powerflex systemid lock to hold till sdc preparation
* Added custom stats support for storage pool, through listStoragePools API
* code improvements, and unit tests
* unit tests fixes
* Update config 'storage.pool.connected.clients.limit' to dynamic, and some improvements
* Stop SDC on host after migration if no volumes mapped to host
* Wait for SDC to connect after scini service start, and some log improvements
* Do not throw exception (log it) when SDC is not connected while revoking access for the powerflex volume
* some log improvements
* Restart agent when host comes out of maintenance
* Don't send CreateStoragePoolCommand to hosts in maintenance mode
* CreateStoragePoolCommand can run when host in maintenance. Reverted the change to restart agent when host was already up and in maintenance
* Reverted changes done to ResourceManagerImplTest
* Create/Export OVA file of the VM on external vCenter host, to temporary conversion location (NFS)
* Fixed ova issue on untar/extract ovf from ova file
"tar -xf" cmd on ova fails with "ovf: Not found in archive" while extracting ovf file
* Updated VMware to KVM instance migration using OVA
* Refactoring and cleanup
* test fixes
* Consider zone wide pools in the destination cluster for instance conversion
* Remove local storage pool support as temporary conversion location
- OVA export not possible as the pool is not accessible outside host, NFS pools are supported.
* cleanup unused code
* some improvements, and refactoring
* import nic unit tests
* vmware guru unit tests
* Separate clone VM and create template file for VMware migration
- Export OVA (of the cloned VM) to the conversion location takes time.
- Do any validations with cloned VM before creating the template (and fail early).
- Updated unit tests.
* Check conversion support on host before clone vm / create template on vmware (and fail early)
* minor code improvements
* Auto select the host with instance conversion capability
* Skip instance conversion supported response param for non-KVM hosts
* Show supported conversion hosts in the UI
* Skip persistence map update if network doesn't exist
* Added support to export OVA from KVM host, through ovftool (when installed in KVM host)
* Updated importvm api param 'usemsforovaexport' to 'forcemstodownloadvmfiles', to be generic
* Updated hardcoded UI messages with message labels
* Updated UI to support importvm api param - forcemstodownloadvmfiles
* Improved instance conversion support checks on ubuntu hosts, and for windows guest vms
* Use OVF template (VM disks and spec files) for instance conversion from VMware, instead of OVA file
- this would further increase the migration performance (as it reduces the time for OVA preparation / archiving of the VM files into a single file)
* OVF export tool parallel threads code improvements
* Updated 'convert.vmware.instance.to.kvm.timeout' config default value to 3 hrs
* Config values check & code improvements
* Updated import log, with time taken and vm details
* Support for parallel downloads of VMware VM disk files while exporting OVF from MS, and other changes below.
- Skip clone for powered off VMs
- Fixes to support standalone host (with its default datacenter)
- Some code improvements
* rebase fixes
* rebase fixes
* minor improvement
* code improvements - threads configuration, and api parameter changes to import vm files
* typo fix in error msg
* prevent an NPE on an uninitialised TemplateObject
* move npe handler up-stack
* Update engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/store/TemplateObject.java
* catch yet one level up
* Update engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/VolumeOrchestrator.java
* Update engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/store/TemplateObject.java
* extra guard
* Revert "prevent an NPE on an uninitialised TemplateObject"
This reverts commit e602a65ea62e4707828483a4ddea288d81ff06f5.
* Veeam: find storage pool by path for PreSetup and VMFS
* Veeam: support VMware distributed virtual switch
* Veeam: sync volumes on Solidfire after backup restoration
user faced the issue that backup is restored but the DATA disk is gone (ROOT disk is ok)
```
2024-05-03 12:00:32,868 ERROR [o.a.c.b.BackupManagerImpl] (API-Job-Executor-13:ctx-aa8a1d85 job-149661 ctx-73328567) (logid:6510cf06) Failed to import VM [vmInternalName: i-169-9679-VM] from backup restoration [{"backupType":"Full","externalId":"821ca400-a5da-4282-bf3f-7c7e38a6cdb4","id":257,"uuid":"69399101-5cbd-461c-8a48-f0c70eac0b24","vmId":9679}] with hypervisor [type: VMware] due to: [Couldn't find storage pool -iqn.2010-01.com.solidfire:3p53.data-9679.221-0].
```
On managed storage, the datastore name of DATA disk is determined by the iscsi_name of the volume.
* Veeam: set correct path for DATA disks on solidfire
* Temporarily backup StorPool volume before expunge
Sometimes the users delete the volumes by mistake. This enhancment
provides a solution to backup the volume before it's deleted. The user
will be able to see the snapshot in CloudStack UI/CLI and create only a
volume from it.
A task will check (by default on every 5mins) if the snapshots are
deleted from StorPool
Global settings to enable the delay delete option:
`storpool.delete.after.interval` - The interval (in seconds) after the StorPool snapshot will be deleted
`storpool.list.snapshots.delete.after.interval` - The interval (in seconds) to fetch the StorPool snapshots with deleteAfter flag
Minor fix when deleting snapshots
* added Apache licence
* addressed comments
* Ability to specify NFS mount options while adding a primary storage and modify it later
* Pull 8947: Rename all occurrence of nfsopt to nfsMountOpt and added nfsMountOpts to ApiConstants
* Pull 8947: Refactor code - move into separate methods
* Pull 8947: CollectionsUtils.isNotEmpty and switch statement in LibvirtStoragePoolDef.java
* Pull 8947: UI - cancel maintainenace will remount the storage pool and apply the options
* Pull 8947: UI - moved edit NFS mount options to edit Primary Storage form
* Pull 8947: UI - moved 'NFS Mount Options' to below 'Type' in dataview
* Pull 8947: Fixed message in AddPrimaryStorage.vue
* Pull 8947: Convert _nfsmountOpts to Set in libvirtStoragePoolDef
* Pull 8947: Throw exception and log error if mount fails due to incorrect mount option
* Pull 8947: Added UT and moved integration test to component/maint
* Pull 8947: Review comments
* Pull 8947: Removed password from integration test
* Pull 8947: move details allocation to inside the if loop in getStoragePoolNFSMountOpts
* Pull 8947: Fixed a bug in AddPrimaryStorage.vue
* Pull 8947: Pool should remain in maintenance mode if mount fails
* Pull 8947: Removed password from integration test
* Pull 8947: Added UT
* Pull 8875: Fixed a bug in CloudStackPrimaryDataStoreLifeCycleImplTest
* Pull 8875: Fixed a bug in LibvirtStoragePoolDefTest
* Pull 8947: minor code restructuring
* Pull 8947 : added some ut for coverage
* Fix LibvirtStorageAdapterTest UT
* Updates to change PUre and Primera to host-centric vlun assignments; various small bug fixes
* update to add timestamp when deleting pure volumes to avoid future conflicts
* update to migrate to properly check disk offering is valid for the target storage pool
* Updates to change PUre and Primera to host-centric vlun assignments; various small bug fixes
* update to add timestamp when deleting pure volumes to avoid future conflicts
* update to migrate to properly check disk offering is valid for the target storage pool
* improve error handling when copying volumes to add precision to which step failed
* rename pure volume before delete to avoid conflicts if the same name is used before its expunged on the array
* remove dead code in AdaptiveDataStoreLifeCycleImpl.java
* Fix issues found in PR checks
* fix session refresh TTL logic
* updates from PR comments
* logic to delete by path ONLY on supported OUI
* fix to StorageSystemDataMotionStrategy compile error
* change noisy debug message to trace message
* fix double callback call in handleVolumeMigrationFromNonManagedStorageToManagedStorage
* fix for flash array delete error
* fix typo in StorageSystemDataMotionStrategy
* change copyVolume to use writeback to speed up copy ops
* remove returning PrimaryStorageDownloadAnswer when connectPhysicalDisk returns false during KVMStorageProcessor template copy
* remove change to only set UUID on snapshot if it is a vmSnapshot
* reverting change to UserVmManagerImpl.configureCustomRootDiskSize
* add error checking/simplification per comments from @slavkap
* Update engine/storage/datamotion/src/main/java/org/apache/cloudstack/storage/motion/StorageSystemDataMotionStrategy.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* address PR comments from @sureshanaparti
---------
Co-authored-by: GLOVER RENE <rg9975@cs419-mgmtserver.rg9975nprd.app.ecp.att.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
During live migration of a VM from between hosts having different cgroup versions (cgroupv2 & cgroup), overcommit ratio is ignored.
This PR fixes the above issue.
With d8c7e34b38 options were added to the host.allocators.order config. Currently, it allows adding only FirstFitRouting as the value. This PR fixes the behaviour and allows other host allocators to be added.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Got an exception when delete a zone
```
com.cloud.utils.exception.CloudRuntimeException: The zone cannot be deleted because there are Secondary storages in this zone
```
- Changes behaviour of details param handling via global setting:
- listVirtualMachines API: when the details param is not provided, it returns whether stats are returned controlled by a new global setting `list.vm.default.details.stats`
- listVirtualMachinesMetrics API: when the details param is not provided, it uses `all` details including `stats`
- Users who are affected slow performance of the listVirtualMachines API response time can set `list.vm.default.details.stats` to `false`
- Remove ConfigKey vm.stats.increment.metrics.in.memory which was renamed to `vm.stats.increment.metrics` in #5984 and also remove unused/unnecessary global settings via upgrade path
- Changes default value of VM stats accumulation setting `vm.stats.increment.metrics` to false until a better solution emerges. Since #5984, this is true and during the execution of listVM APIs the stats are clubbed/calculated which can immensely slow down list VM API calls. Any costly operations such as summing of stats shouldn't be done during the course of a synchronous API, such as the list VM API.
- Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail when in list view without metrics selected.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The user_vm_view can end up not picking the right index to join against
the user_ip_address table causing full table scan on the user_ip_address
table. This could be related to a MySQL bug
https://bugs.mysql.com/bug.php?id=41220
In a test environment with 20k shared networks and over 20M IPs, the
listVirtualMachines API was found to take over 17s to return list of
just 10 VMs. However, with this fix it would now take under 200ms to
return the list.
MySQL slow query logging showed ~nearly 20M table scans of the IP
address table:
```
# User@Host: cloud[cloud] @ localhost [127.0.0.1] Id: 39
# Query_time: 8.227541 Lock_time: 0.000014 Rows_sent: 12 Rows_examined: 19,667,235
SET timestamp=1715410270;
SELECT user_vm_view.id, user_vm_view.name /*snipped*/ FROM user_vm_view
WHERE user_vm_view.id IN (4,6,7,8,9,10,11,12,13,14,15,16);
```
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* kvm: replace ISO path in vm XML configuration during vm migration
* Update 9212: address comments
* kvm: fix vm migration if there are multiple image stores
* list by isEncrypted
* use filter on VO and cleanup
* add encryption type to volume response
* Update api/src/main/java/org/apache/cloudstack/api/command/user/volume/ListVolumesCmd.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
Co-authored-by: Bryan Lima <bryan.lima@hotmail.com>
Co-authored-by: SadiJr <sadi@scclouds.com.br>
Co-authored-by: Bryan Lima <42067040+BryanMLima@users.noreply.github.com>
Co-authored-by: Henrique Sato <henriquesato2003@gmail.com>
* Allow overriding root diskoffering id & size while restoring VM
* UI changes
* Allow expunging of old disk while restoring a VM
* Resolve comments
* Address comments
* Duplicate volume's details while duplicating volume
* Allow setting IOPS for the new volume
* minor cleanup
* fixup
* Add checks for template size
* Replace strings for IOPS with constants
* Fix saveVolumeDetails method
* Fixup
* Fixup UI styling
Add a global setting to control whether redirection is allowed while
downloading templates and volumes
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Add a global setting to control whether redirection is allowed while
downloading templates and volumes
core: some changes on SimpleHttpMultiFileDownloader
similar as HttpTemplateDownloader
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
(cherry picked from commit b1642bc3bf)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Move allow.additional.vm.configuration.list.kvm from Global to Account setting
- Disallow VM details start with "extraconfig" when deploy VMs
- Skip changes on VM details start with "extraconfig" when update VM settings
- Allow only extraconfig for DPDK in service offering details
- Check if extraconfig values in vm details are supported when start VMs
- Check if extraconfig values in service offering details are supported when start VMs
- Disallow add/edit/update VM setting for extraconfig on UI
(cherry picked from commit e6e4fe16fb)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 7aea9db1c8)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Move allow.additional.vm.configuration.list.kvm from Global to Account setting
- Disallow VM details start with "extraconfig" when deploy VMs
- Skip changes on VM details start with "extraconfig" when update VM settings
- Allow only extraconfig for DPDK in service offering details
- Check if extraconfig values in vm details are supported when start VMs
- Check if extraconfig values in service offering details are supported when start VMs
- Disallow add/edit/update VM setting for extraconfig on UI
(cherry picked from commit e6e4fe16fb)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* storage,plugins: delegate allow zone-wide volume migration check and access grant to storage drivers
Following checks have been delegated to storage drivers,
- For volumes on zone-wide storage, whether they need storage migration when VM is migrated
- Whther volume required grant access
Apply fixes in resolving PrimaryDataStore
* add tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* unused import
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Update engine/orchestration/src/test/java/org/apache/cloudstack/engine/orchestration/VolumeOrchestratorTest.java
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
This fixes https://github.com/apache/cloudstack/issues/8595
```
2024-02-01 16:23:52,473 INFO [c.c.n.s.SecurityGroupManagerImpl] (AgentManager-Handler-16:null) (logid:) Network Group full sync for agent 1 found 3 vms out of sync
2024-02-01 16:23:52,473 DEBUG [c.c.n.s.SecurityGroupManagerImpl] (AgentManager-Handler-16:null) (logid:) Security Group Mgr v2: scheduling ruleset updates for 3 vms (unique=3), current queue size=0
2024-02-01 16:23:52,473 DEBUG [c.c.n.s.SecurityGroupManagerImpl] (AgentManager-Handler-16:null) (logid:) Security Group Mgr v2: done scheduling ruleset updates for 3 vms: num new jobs=3 num rows insert or updated=0 time taken=0
2024-02-01 16:23:52,478 ERROR [c.c.n.s.SecurityGroupManagerImpl] (SecGrp-Worker-20:ctx-0aa3885d) (logid:472b30d2) Problem during SG work com.cloud.network.security.LocalSecurityGroupWorkQueue$LocalSecurityGroupWork@5
com.cloud.utils.exception.CloudRuntimeException: DB Exception on: com.mysql.cj.jdbc.ClientPreparedStatement: SELECT SQL_CACHE security_group_vm_map.id, security_group_vm_map.security_group_id, security_group_vm_map.instance_id, nics.ip4_address, vm_instance.state, security_group.name FROM security_group_vm_map INNER JOIN nics ON security_group_vm_map.instance_id=nics.instance_id INNER JOIN vm_instance ON security_group_vm_map.instance_id=vm_instance.id INNER JOIN security_group ON security_group_vm_map.security_group_id=security_group.id WHERE security_group_vm_map.security_group_id = 3 AND vm_instance.state='Running'
at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:424)
at com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:938)
at com.cloud.utils.db.GenericDaoBase.listBy(GenericDaoBase.java:928)
at com.cloud.network.security.dao.SecurityGroupVMMapDaoImpl.listBySecurityGroup(SecurityGroupVMMapDaoImpl.java:134)
at jdk.internal.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
at com.sun.proxy.$Proxy245.listBySecurityGroup(Unknown Source)
at com.cloud.network.security.SecurityGroupManagerImpl2.generateRulesForVM(SecurityGroupManagerImpl2.java:246)
at com.cloud.network.security.SecurityGroupManagerImpl2.sendRulesetUpdates(SecurityGroupManagerImpl2.java:177)
at com.cloud.network.security.SecurityGroupManagerImpl2.work(SecurityGroupManagerImpl2.java:157)
at com.cloud.network.security.SecurityGroupManagerImpl2$WorkerThread$1.run(SecurityGroupManagerImpl2.java:75)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
at com.cloud.network.security.SecurityGroupManagerImpl2$WorkerThread.run(SecurityGroupManagerImpl2.java:72)
Caused by: java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '.id, security_group_vm_map.security_group_id, security_group_vm_map.instance_id,' at line 1
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
... 28 more
```
* Introduced a new API checkVolumeAndRepair that allows users or admins to check and repair if any leaks observed.
Currently this is supported only for KVM
* some fixes
* Added unit tests
* addressed review comments
* add repair volume while granting access
* Changed repair parameter to accept both leaks/all
* Introduced new global setting volume.check.and.repair.before.use to do volume check and repair before VM start or volume attach operations
* Added volume check and repair changes only during VM start and volume attach operations
* Refactored the names to look similar across the code
* Some code fixes
* remove unused code
* Renamed repair values
* Fixed unit tests
* changed version
* Address review comments
* Code refactored
* used volume name in logs
* Changed the API to Async and the setting scope to storage pool
* Fixed exit value handling with check volume command
* Fixed storage scope to the setting
* Fix volume format issues
* Refactored the log messages
* Fix formatting
* 4.18:
Storage plugin support to check if volume on datastore requires access for migration (#8655)
CKS: fix /opt/bin/deploy-cloudstack-secret in CKS control nodes (#8697)
* Check if volume on datastore requires access for migration, and grant/revoke volume access if requires
* Updated default implementation for requiresAccessForMigration method in PrimaryDataStoreDriver
This PR fixes bug introduced in #8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release.
We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status.
We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds).
For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand.
Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below.
19250403e6/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java (L406-L442)
This PR fixes several issues in the testing of Veeam 11 and Veeam12
- Import Veeam.Backup.PowerShell and silently ignore the warning messages
- Fix issue when assign vm to backup offerings, which caused by separator (\r\n)
- Fix authorization failure in veeam 12a, which is because v1_4 is not supported in veeam 12a any more
- Fix exception if backup name has space
- Fix backup metrics in veeam12, which is because powershell command does not return the values needed
- Fix Incorrect datetime value, which is because powershell command returns a datetime which is not supported in Java
- Fix issue during backup restoration if VM has both ROOT and DATA disks.
This PR also has the following update
- Add integration test test/integration/smoke/test_backup_recovery_veeam.py
- Make some UI changes
- Add zone setting backup.plugin.veeam.version. If it is not set, CloudStack will get veeam version via powershell commands.
- Add zone setting backup.plugin.veeam.task.poll.interval and backup.plugin.veeam.task.poll.max.retry
This PR fixes moves resources stuck in transition state during async job cleanup
Problem:
During maintenance of the management server, other servers in the cluster or the same server after a restart initiate async job cleanup. However, this process leaves resources in a transitional state. The only recovery option currently available is to make direct database changes.
Solution:
This PR introduces a resolution by changing Volume, Virtual Machine, and Network resources from their transitional states. This adjustment enables the reattempt of failed operations without the need for manual database modifications.
There are a lot of test failures due to test_vm_life_cycle.py in multiple PRs due to host not available for migration of VMs.
#8438 (comment)
#8433 (comment)
#7344 (comment)
While debugging I noticed that the hosts get stuck in Connecting state because MS is waiting for a response of the ReadyCommand from the agent. Since we take a lock on connection and disconnection, restarting the agent doesn't work. To fix this, we have to restart the MS or wait for ~1 hour (default timeout).
On the agent side, it gets stuck waiting for a response from the Script execution.
To reproduce, run smoke/test_vm_life_cycle.py (TestSecuredVmMigration test class to be specific). Once the tests are complete, you will notice that some hosts are stuck in Connecting state. And restarting the agent fails due to the named lock. Locks on DB can be checked using the below query.
SELECT *
FROM performance_schema.metadata_locks
INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
WHERE PROCESSLIST_ID <> CONNECTION_ID() \G;
This PR adds a wait for the ready command and a timeout to the Script execution to ensure that the thread doesn't get stuck and the named lock from database is released.
This PR fixes a regression caused by #8465 on advanced zones, import fails with:
2024-01-10 12:13:33,234 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Allocating nic for vm 142272e8-9e2e-407b-9d7e-e9a03b81653c in network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10} during import
2024-01-10 12:13:33,239 ERROR [o.a.c.v.UnmanagedVMsManagerImpl] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Failed to import NICs while importing vm: i-2-31-VM
com.cloud.exception.InsufficientVirtualNetworkCapacityException: Unable to acquire Guest IP address for network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10}Scope=interface com.cloud.dc.DataCenter; id=1
at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.importNic(NetworkOrchestrator.java:4582)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importNic(UnmanagedVMsManagerImpl.java:859)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importVirtualMachineInternal(UnmanagedVMsManagerImpl.java:1198)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstanceFromHypervisor(UnmanagedVMsManagerImpl.java:1511)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.baseImportInstance(UnmanagedVMsManagerImpl.java:1342)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstance(UnmanagedVMsManagerImpl.java:1282)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Also, addresses the VNC password field set instead of a fixed string
This PR fixes reorder/list pools when cluster details are not set, while deploying vm / attaching volume.
Problem:
Attach volume to a VM fails, on infra with zone-wide pools & vm.allocation.algorithm=userdispersing as the cluster details are not set (passed as null) while reordering / listing pools by volumes.
Solution:
Ignore cluster details when not set, while reordering / listing pools by volumes.
Fixes#8412
Add support for 8.0.0.2 explicitly to prevent falling over to the parent version
Adds log when hypervisor capabilities fail over to the parent version
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
GuestOS mappings are retrieved from the parent hypervisor version when a minor, patch hypervisor version doesn't exist.
Fixes#8412
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
This PR updates the conserve mode of default vpc tier offering to conserve_mode=1
so we can create both port forwarding and load balancing rules on a public IP in vpc tiers.
This fixes#8313
When a public IP gets removed from quarantine, the removal reason gets saved to the database; however, it may also be useful for operators to know who removed the public IP from quarantine. For that reason, this PR extends the public IP quarantine feature so that the account that deliberately removed an IP from quarantine also gets saved to the database.
1. Problem description
In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation.
Equation 1
shares = CPU * speed
Fixes: #6744
2. Proposed changes
To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.
The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS.
Equation 2
shares = (VM requested shares * cgroup upper limit)/host max shares
To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host.
To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency.
It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.