Commit Graph

1099 Commits

Author SHA1 Message Date
dahn bb79f0b727
engine/schema: create default network offering for vpc tier with conserve_mode=1 for fresh installation (#10744) (#10843)
Co-authored-by: Wei Zhou <weizhou@apache.org>
2025-05-27 08:17:49 +02:00
slavkap c183fc9859
Prevent data corruption for StorPool volumes (#10799) 2025-05-16 10:02:33 +02:00
Abhishek Kumar 919c9797cc
server: prevent duplicate HA works and alerts (#10624)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2025-05-06 10:42:30 +02:00
Fabricio Duarte 9d263cd71b
Network Usage event model adjustments (#10755) 2025-04-26 17:35:28 +02:00
Wei Zhou 7b68615bd9
HA: set correct hostId of HA work for vm migration (#10591) 2025-04-17 10:02:46 +02:00
Daan Hoogland 4a3686297d Updating pom.xml version numbers for release 4.19.3.0-SNAPSHOT
Signed-off-by: Daan Hoogland <daan@onecht.net>
2025-02-25 10:43:11 +01:00
Daan Hoogland 4e321d4356 Updating pom.xml version numbers for release 4.19.2.0
Signed-off-by: Daan Hoogland <daan@onecht.net>
2025-02-20 09:32:07 +01:00
Rene Glover 3337f425ff
Primera pure patches & various small fixes (#10132)
Co-authored-by: GLOVER RENE <rg9975@cs419-mgmtserver.rg9975nprd.app.ecp.att.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
2025-02-07 13:19:34 +01:00
Wei Zhou fbb1ff78d6
Static Routes: fix check on wrong global configuration (#10066) 2025-01-31 11:04:13 +01:00
dahn f652ad0d98
extra null guard (#10264) 2025-01-27 14:14:31 +01:00
dahn 0a77eb7f85
deal with NPE during host reconnect (#10158)
* log to see what command is being processed

* exception names
2025-01-24 15:39:56 +05:30
Suresh Kumar Anaparti b4ad04badf
Allow config drive deletion of migrated VM, on host maintenance (#10045) 2024-12-18 09:12:28 +01:00
Suresh Kumar Anaparti 3faf7cd2f1
Updating pom.xml version numbers for release 4.19.2.0-SNAPSHOT
Signed-off-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2024-07-19 10:29:26 +05:30
Suresh Kumar Anaparti 9f4c895974
Updating pom.xml version numbers for release 4.19.1.0
Signed-off-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2024-07-15 17:19:29 +05:30
Vishesh bcbf152a05
Merge branch '4.18' into 4.19 2024-06-28 20:14:21 +05:30
Abhisar Sinha 644f3a3f48
Add, Delete Storage Pool commands should be able execute on a host in maintenance (#9301)
* Restart agent when host comes out of maintenance

* Don't send CreateStoragePoolCommand to hosts in maintenance mode

* CreateStoragePoolCommand can run when host in maintenance. Reverted the change to restart agent when host was already up and in maintenance

* Reverted changes done to ResourceManagerImplTest
2024-06-28 18:18:08 +05:30
Abhisar Sinha 646c894ec6
Fix for race when automatically assigning IP to Vms (#9240)
* Fix for race when automatically assigning IP to Vms

* code refactor
2024-06-28 17:11:16 +05:30
Suresh Kumar Anaparti 46f672563e
Improve migration of external VMware VMs into KVM cluster (#8815)
* Create/Export OVA file of the VM on external vCenter host, to temporary conversion location (NFS)

* Fixed ova issue on untar/extract ovf from ova file
"tar -xf" cmd on ova fails with "ovf: Not found in archive" while extracting ovf file

* Updated VMware to KVM instance migration using OVA

* Refactoring and cleanup

* test fixes

* Consider zone wide pools in the destination cluster for instance conversion

* Remove local storage pool support as temporary conversion location
- OVA export not possible as the pool is not accessible outside host, NFS pools are supported.

* cleanup unused code

* some improvements, and refactoring

* import nic unit tests

* vmware guru unit tests

* Separate clone VM and create template file for VMware migration
- Export OVA (of the cloned VM) to the conversion location takes time.
- Do any validations with cloned VM before creating the template (and fail early).
- Updated unit tests.

* Check conversion support on host before clone vm / create template on vmware (and fail early)

* minor code improvements

* Auto select the host with instance conversion capability

* Skip instance conversion supported response param for non-KVM hosts

* Show supported conversion hosts in the UI

* Skip persistence map update if network doesn't exist

* Added support to export OVA from KVM host, through ovftool (when installed in KVM host)

* Updated importvm api param 'usemsforovaexport' to 'forcemstodownloadvmfiles', to be generic

* Updated hardcoded UI messages with message labels

* Updated UI to support importvm api param - forcemstodownloadvmfiles

* Improved instance conversion support checks on ubuntu hosts, and for windows guest vms

* Use OVF template (VM disks and spec files) for instance conversion from VMware, instead of OVA file
 - this would further increase the migration performance (as it reduces the time for OVA preparation / archiving of the VM files into a single file)

* OVF export tool parallel threads code improvements

* Updated 'convert.vmware.instance.to.kvm.timeout' config default value to 3 hrs

* Config values check & code improvements

* Updated import log, with time taken and vm details

* Support for parallel downloads of VMware VM disk files while exporting OVF from MS, and other changes below.
- Skip clone for powered off VMs
- Fixes to support standalone host (with its default datacenter)
- Some code improvements

* rebase fixes

* rebase fixes

* minor improvement

* code improvements - threads configuration, and api parameter changes to import vm files

* typo fix in error msg
2024-06-27 21:14:13 +05:30
Abhishek Kumar b22315db85
server: event for HA vm start (#9202) 2024-06-26 15:38:47 +05:30
Vishesh dc74d5ba88
Let network guru decide if ipv6 cidr size can't be equal to 64 (#9289) 2024-06-26 02:43:26 +05:30
Rene Glover 6ee6603359
Updates to HPE-Primera and Pure FlashArray Drivers to use Host-based VLUN Assignments (#8889)
* Updates to change PUre and Primera to host-centric vlun assignments; various small bug fixes

* update to add timestamp when deleting pure volumes to avoid future conflicts

* update to migrate to properly check disk offering is valid for the target storage pool

* Updates to change PUre and Primera to host-centric vlun assignments; various small bug fixes

* update to add timestamp when deleting pure volumes to avoid future conflicts

* update to migrate to properly check disk offering is valid for the target storage pool

* improve error handling when copying volumes to add precision to which step failed

* rename pure volume before delete to avoid conflicts if the same name is used before its expunged on the array

* remove dead code in AdaptiveDataStoreLifeCycleImpl.java

* Fix issues found in PR checks

* fix session refresh TTL logic

* updates from PR comments

* logic to delete by path ONLY on supported OUI

* fix to StorageSystemDataMotionStrategy compile error

* change noisy debug message to trace message

* fix double callback call in handleVolumeMigrationFromNonManagedStorageToManagedStorage

* fix for flash array delete error

* fix typo in StorageSystemDataMotionStrategy

* change copyVolume to use writeback to speed up copy ops

* remove returning PrimaryStorageDownloadAnswer when connectPhysicalDisk returns false during KVMStorageProcessor template copy

* remove change to only set UUID on snapshot if it is a vmSnapshot

* reverting change to UserVmManagerImpl.configureCustomRootDiskSize

* add error checking/simplification per comments from @slavkap

* Update engine/storage/datamotion/src/main/java/org/apache/cloudstack/storage/motion/StorageSystemDataMotionStrategy.java

Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>

* address PR comments from @sureshanaparti

---------

Co-authored-by: GLOVER RENE <rg9975@cs419-mgmtserver.rg9975nprd.app.ecp.att.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
2024-06-25 10:35:39 +05:30
Vishesh 351de5fabd
engine/orchestration: Update overcommit ratio during live VM migration (#9178)
During live migration of a VM from between hosts having different cgroup versions (cgroupv2 & cgroup), overcommit ratio is ignored.

This PR fixes the above issue.
2024-06-24 20:45:31 +05:30
Suresh Kumar Anaparti 9055610034
Remove duplicate network state checks before shutdown network (#8462) 2024-06-21 14:12:07 +02:00
Abhishek Kumar abbc61c01e
engine-orchestration: expunge destroyed system vm volume (#9197)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-06-13 10:00:22 +02:00
Vishesh 6b4955affe
Fix message publish in transaction (#8980)
* Fix message publish in transaction

* Resolve comments
2024-05-07 13:27:31 +05:30
Vishesh 80a8b80a9d
Update volume's passphrase to null if diskOffering doesn't support encryption (#8904) 2024-04-29 12:18:09 +05:30
SadiJr 96ae479000
[Usage] Create network billing (#7236)
Co-authored-by: Bryan Lima <bryan.lima@hotmail.com>
Co-authored-by: SadiJr <sadi@scclouds.com.br>
Co-authored-by: Bryan Lima <42067040+BryanMLima@users.noreply.github.com>
Co-authored-by: Henrique Sato <henriquesato2003@gmail.com>
2024-04-24 08:52:49 +02:00
Wei Zhou 0b857def68
New feature: Import/Unamange DATA volume from storage pool (#8808) 2024-04-23 16:05:59 +02:00
João Jandre 8a101fbbc1 Updating pom.xml version numbers for release 4.18.3.0-SNAPSHOT
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-04-17 11:11:57 -03:00
Vishesh 44aa08c02a
Fixup 4.19 build issue (#8905) 2024-04-12 16:37:25 +02:00
Vishesh b998e7dbb6
Allow overriding root disk offering & size, and expunge old root disk while restoring a VM (#8800)
* Allow overriding root diskoffering id & size while restoring VM

* UI changes

* Allow expunging of old disk while restoring a VM

* Resolve comments

* Address comments

* Duplicate volume's details while duplicating volume

* Allow setting IOPS for the new volume

* minor cleanup

* fixup

* Add checks for template size

* Replace strings for IOPS with constants

* Fix saveVolumeDetails method

* Fixup

* Fixup UI styling
2024-04-12 17:47:52 +05:30
João Jandre 154566f914 Updating pom.xml version numbers for release 4.18.2.0
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-04-12 08:25:04 -03:00
Vishesh 730cc5d5b8
Change iops on offering change (#8872)
* Change IOPS on disk offering change

* Remove iops & bandwidth limits before copying template

* minor refactor

* Handle diskOfferingDetails

* Fixup
2024-04-11 17:01:55 +05:30
Abhishek Kumar ffd59720dd
storage,plugins: delegate allow zone-wide volume migration check and access grant check to storage drivers (#8762)
* storage,plugins: delegate allow zone-wide volume migration check and access grant to storage drivers

Following checks have been delegated to storage drivers,
- For volumes on zone-wide storage, whether they need storage migration when VM is migrated
- Whther volume required grant access

Apply fixes in resolving PrimaryDataStore

* add tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unused import

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* Update engine/orchestration/src/test/java/org/apache/cloudstack/engine/orchestration/VolumeOrchestratorTest.java

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-03-18 17:28:14 +05:30
Harikrishna c462be1412
New API "checkVolume" to check and repair any leaks or issues reported by qemu-img check (#8577)
* Introduced a new API checkVolumeAndRepair that allows users or admins to check and repair if any leaks observed.
Currently this is supported only for KVM

* some fixes

* Added unit tests

* addressed review comments

* add repair volume while granting access

* Changed repair parameter to accept both leaks/all

* Introduced new global setting volume.check.and.repair.before.use to do volume check and repair before VM start or volume attach operations

* Added volume check and repair changes only during VM start and volume attach operations

* Refactored the names to look similar across the code

* Some code fixes

* remove unused code

* Renamed repair values

* Fixed unit tests

* changed version

* Address review comments

* Code refactored

* used volume name in logs

* Changed the API to Async and the setting scope to storage pool

* Fixed exit value handling with check volume command

* Fixed storage scope to the setting

* Fix volume format issues

* Refactored the log messages

* Fix formatting
2024-02-29 14:41:49 +05:30
Daan Hoogland f4987bf8ee Merge release branch 4.18 to 4.19
* 4.18:
  Storage plugin support to check if volume on datastore requires access for migration (#8655)
  CKS: fix /opt/bin/deploy-cloudstack-secret in CKS control nodes (#8697)
2024-02-26 15:53:11 +01:00
Suresh Kumar Anaparti f731fe882c
Storage plugin support to check if volume on datastore requires access for migration (#8655)
* Check if volume on datastore requires access for migration, and grant/revoke volume access if requires

* Updated default implementation for requiresAccessForMigration method in PrimaryDataStoreDriver
2024-02-26 20:16:31 +05:30
Vishesh 1a1131154e
Fixup vm powerstate update (#8545)
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2024-02-19 13:56:21 +01:00
dahn a0e592e945
prevent nic removal on out of bounds router stop (#8371)
Co-authored-by: Vishesh <vishesh92@gmail.com>
Co-authored-by: Wei Zhou <weizhou@apache.org>
2024-02-16 14:33:22 +01:00
Suresh Kumar Anaparti f702f7f57c
Remove sensitive params (VmPassword, etc) from VMWork log (#8553) 2024-02-05 13:26:18 +05:30
Abhishek Kumar a7b97ff3b0 Updating pom.xml version numbers for release 4.19.1.0-SNAPSHOT
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-02-02 18:06:04 +05:30
Abhishek Kumar 2746225b99 Updating pom.xml version numbers for release 4.19.0.0
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-01-29 10:21:52 +05:30
Vishesh fedcf66de0
Externalise a few timeouts & fix timeout for hostSupportsUefi in libvirt ready command wrapper (#8547)
This PR fixes bug introduced in #8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release.

We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status.

We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds).

For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand.

Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below.
19250403e6/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java (L406-L442)
2024-01-27 23:36:13 +05:30
kishankavala 80bbb29abf
CleanUp Async Jobs after mgmt server maintenance (#8394)
This PR fixes moves resources stuck in transition state during async job cleanup

Problem:
During maintenance of the management server, other servers in the cluster or the same server after a restart initiate async job cleanup. However, this process leaves resources in a transitional state. The only recovery option currently available is to make direct database changes.

Solution:
This PR introduces a resolution by changing Volume, Virtual Machine, and Network resources from their transitional states. This adjustment enables the reattempt of failed operations without the need for manual database modifications.
2024-01-19 13:26:25 +05:30
Vishesh c3b77cb7b8
Fix host stuck in connecting state (#8502)
There are a lot of test failures due to test_vm_life_cycle.py in multiple PRs due to host not available for migration of VMs.
#8438 (comment)
#8433 (comment)
#7344 (comment)

While debugging I noticed that the hosts get stuck in Connecting state because MS is waiting for a response of the ReadyCommand from the agent. Since we take a lock on connection and disconnection, restarting the agent doesn't work. To fix this, we have to restart the MS or wait for ~1 hour (default timeout).

On the agent side, it gets stuck waiting for a response from the Script execution.

To reproduce, run smoke/test_vm_life_cycle.py (TestSecuredVmMigration test class to be specific). Once the tests are complete, you will notice that some hosts are stuck in Connecting state. And restarting the agent fails due to the named lock. Locks on DB can be checked using the below query.

SELECT *
FROM performance_schema.metadata_locks
INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
WHERE PROCESSLIST_ID <> CONNECTION_ID() \G;

This PR adds a wait for the ready command and a timeout to the Script execution to ensure that the thread doesn't get stuck and the named lock from database is released.
2024-01-15 13:56:34 +05:30
Abhishek Kumar 3936f7c2cf
vm-import: kvm import and fix volume size when lesser than 1GiB (#8500)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Daan Hoogland <daan@onecht.net>
2024-01-12 13:32:02 +01:00
Nicolas Vazquez a3a4833c3e
Fixes for KVM unmanaged instances import on advanced network and VNC password (#8492)
This PR fixes a regression caused by #8465 on advanced zones, import fails with:

2024-01-10 12:13:33,234 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Allocating nic for vm 142272e8-9e2e-407b-9d7e-e9a03b81653c in network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10} during import
2024-01-10 12:13:33,239 ERROR [o.a.c.v.UnmanagedVMsManagerImpl] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Failed to import NICs while importing vm: i-2-31-VM
com.cloud.exception.InsufficientVirtualNetworkCapacityException: Unable to acquire Guest IP  address for network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10}Scope=interface com.cloud.dc.DataCenter; id=1
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.importNic(NetworkOrchestrator.java:4582)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importNic(UnmanagedVMsManagerImpl.java:859)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importVirtualMachineInternal(UnmanagedVMsManagerImpl.java:1198)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstanceFromHypervisor(UnmanagedVMsManagerImpl.java:1511)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.baseImportInstance(UnmanagedVMsManagerImpl.java:1342)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstance(UnmanagedVMsManagerImpl.java:1282)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Also, addresses the VNC password field set instead of a fixed string
2024-01-12 14:14:01 +05:30
Nicolas Vazquez b8d3e342be
Fix KVM import unmanaged instances on basic zone (#8465)
This PR fixes import unmanaged instances on KVM basic zones, on top of #8433

Fixes: #8439: point 1
2024-01-10 13:21:00 +05:30
Abhishek Kumar 26214ea139 Merge remote-tracking branch 'apache/4.18' 2023-12-21 20:55:38 +05:30
Wei Zhou 9d3a7be4dd
server: fix debug message when expunge a vm (#8374)
This PR fixes the debug message when expunge a vm
2023-12-21 14:17:57 +05:30