Commit Graph

928 Commits

Author SHA1 Message Date
Abhishek Kumar 4400e02a1b
framework/config,server: configkey caching (#472)
Added caching for ConfigKey value retrievals based on the Caffeine
in-memory caching library.
https://github.com/ben-manes/caffeine
Currently, expire time for a cache is 1 minute and each update of the
config key invalidates the cache.

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-09-03 15:53:08 +05:30
Rohit Yadav b46e4d4bbf
framework/cluster: improve cluster service and integration API service (#465)
- mTLS implementation for cluster service communication
- Listen only on the specified cluster node IP address instead of all interfaces
- Validate incoming cluster service requests are from peer management servers based on the server's certificate dns name which can be through global config - ca.framework.cert.management.custom.san
- Hardening of KVM command wrapper script execution
- Improve API server integration port check
- cloudstack-management.default: don't have JMX configuration if not needed. JMX is used for instrumentation; users who need to use it should enable it explicitly

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Wei Zhou <weizhou@apache.org>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
(cherry picked from commit 4f5561937c)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2024-07-09 09:03:40 +05:30
Suresh Kumar Anaparti be87b1a668
FR74: Mitigation for non-scalable ScaleIO clients (#447)
* Mitigation for non-scalable Powerflex/ScaleIO clients
- Added ScaleIOSDCManager to manage SDC connections, checks clients limit, prepare and unprepare SDC on the hosts.
- Added commands for prepare and unprepare storage clients to prepare/start and stop SDC service respectively on the hosts.
- Introduced config 'storage.pool.connected.clients.limit' at storage level for client limits, currently support for Powerflex only.

* tests issue fixed

* refactor / improvements

* lock with powerflex systemid while checking connections limit

* updated powerflex systemid lock to hold till sdc preparation

* Added custom stats support for storage pool, through listStoragePools API

* code improvements, and unit tests

* Update config 'storage.pool.connected.clients.limit' to dynamic, and some improvements

* Stop SDC on host after migration if no volumes mapped to host

* Wait for SDC to connect after scini service start, and some log improvements

* Do not throw exception (log it) when SDC is not connected while revoking access for the powerflex volume

* some log improvements
2024-06-27 18:47:50 +05:30
Vishesh c2de75744e
kvm: Add support for cgroupv2 (#8252) (#459)
* kvm: Add support for cgroupv2 (#8252)

1. Problem description

In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation.

    Equation 1
    shares = CPU * speed

Fixes: #6744
2. Proposed changes

To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.

The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS.

    Equation 2
    shares = (VM requested shares * cgroup upper limit)/host max shares

To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host.

To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency.

It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.

* Update overcommit ratio during live VM migration

* minor refactoring

---------

Co-authored-by: Bryan Lima <42067040+BryanMLima@users.noreply.github.com>
2024-06-27 12:22:17 +05:30
Suresh Kumar Anaparti bda0543dd0
ScaleIO volume live migration - use usable bytes from source disk to format the destination disk (#452)
* ScaleIO volume live migration - use usable bytes from source disk to format the destination disk

* Don't abort block copy job when cur,end = 0

* code improvements
2024-06-10 14:12:06 +05:30
Wei Zhou e065c93c3f
Apple FR76: Implicit host tags (#427)
* Merge two HostTagVO and HostTagDaoImpl

* Apple FR76: dynamic host tags

* Revert "Apple FR76: dynamic host tags"

This reverts commit 01b93a873f167018c4fafd0744c0de07ae4de4ed.

* Apple FR76: Implicit host tags

* Apple FR76: address Abhishek's comments

* Apple FR76: move updateImplicitTags

* Apple FR76: add since to other two responses

* Update 8929: add unit test in LibvirtComputingResourceTest

* Update variable names

* Update FR76: add explicithosttags in response

* Update FR76 UI: Update explicit host tags

* Update 8929: remove host tags and change labels on UI

* Update: ui polish for host tags

* fix since in responses

* Update 8929: fix UI error if no host tags
2024-05-30 17:20:37 +05:30
Marcus Sorensen 227dc5e86a
Add ability to set cpu.threadspercore similar to existing cpu.corespersocket (#411)
* Add ability to set cpu.threadspercore similar to existing cpu.corespersocket

* Add license to new test file

* Add tests to handle some edge cases

* Add some edge test cases to CPU topology

* Rework logic on KVM CPU topology, handle more cases

* Add more test cases

* Add more test cases

* Update plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java

Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>

* Added cpu.threadspercore detail in listDetailOptions response (for KVM hypervisor)

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2024-04-10 09:58:03 -06:00
Marcus Sorensen 631b0960f3
Allow kvm storage plugin to customize diskdef, add geometry (#402)
* Allow kvm storage plugin to customize diskdef, add geometry

* formatting update

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2024-04-08 09:28:59 -06:00
Rohit Yadav 0c23820c7c
Merge pull request #414 from shapeblue/security-backport418
Backport upstream security fixes to apple-base418
2024-04-03 19:58:28 +05:30
Marcus Sorensen ac4b030759
Mark libvirt events experimental, add properties flag (#404)
Co-authored-by: Marcus Sorensen <mls@apple.com>
2024-04-03 08:03:26 -06:00
Abhishek Kumar 996ae9a959 engine-storage: control download redirection
Add a global setting to control whether redirection is allowed while
downloading templates and volumes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-04-01 09:23:17 +05:30
Marcus Sorensen 98dda22a83
Support KVM storage implementations controlling logical/physical block size (#390)
* Support KVM storage implementations controlling logical/physical block io size

* Support custom block size during disk attach

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
2024-03-15 14:06:33 +05:30
Harikrishna 747d1101c1
New API "checkVolume" to check and repair any leaks or repair all issues (#362)
* Introduced a new API "checkVolumeAndRepair" that allows users or admins to check and repair if any leaks observed.
Currently this is supported only for KVM

* some fixes

* Added unit tests

* addressed review comments

* add repair volume while granting access

* Changed repair parameter to accept both leaks/all values

* Introduced new global setting volume.check.and.repair.before.use to do volume check and repair before VM start or volume attach operations

* Added volume check and repair changes only during VM start and volume attach operations

* Refactored the names to look similar  across the code

* Some code fixes

* remove unused code

* Renamed repair values

* Addressed review comments

* code refactored

* used volume name in logs

* Changed the API to Async and the setting scope to storage pool

* Fixed exit value handling with check volume command

* Fixed storage scope to the setting

* Fixed volume format issues

* Refactored the log messages

* Fix formatting
2024-02-29 14:40:40 +05:30
Vishesh f30e07b312
Fix host stuck in connecting state (#375)
* Fix host stuck in connecting state (#8502)

There are a lot of test failures due to test_vm_life_cycle.py in multiple PRs due to host not available for migration of VMs.
#8438 (comment)
#8433 (comment)
#7344 (comment)

While debugging I noticed that the hosts get stuck in Connecting state because MS is waiting for a response of the ReadyCommand from the agent. Since we take a lock on connection and disconnection, restarting the agent doesn't work. To fix this, we have to restart the MS or wait for ~1 hour (default timeout).

On the agent side, it gets stuck waiting for a response from the Script execution.

To reproduce, run smoke/test_vm_life_cycle.py (TestSecuredVmMigration test class to be specific). Once the tests are complete, you will notice that some hosts are stuck in Connecting state. And restarting the agent fails due to the named lock. Locks on DB can be checked using the below query.

SELECT *
FROM performance_schema.metadata_locks
INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
WHERE PROCESSLIST_ID <> CONNECTION_ID() \G;

This PR adds a wait for the ready command and a timeout to the Script execution to ensure that the thread doesn't get stuck and the named lock from database is released.

* Externalise a few timeouts & fix timeout for hostSupportsUefi in libvirt ready command wrapper (#8547)

This PR fixes bug introduced in #8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release.

We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status.

We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds).

For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand.

Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below.
19250403e6/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java (L406-L442)

* fixup
2024-02-21 13:44:53 +05:30
Marcus Sorensen f49265c14c
Fix missing code from backport of 4.16 version of dom0 CPU reserve (#374)
Co-authored-by: Marcus Sorensen <mls@apple.com>
2024-02-05 11:53:45 +05:30
Marcus Sorensen e610d2c54c
Fix libvirt domain event listener by properly processing events (#364)
Co-authored-by: Marcus Sorensen <mls@apple.com>
2024-01-29 14:46:20 +05:30
Marcus Sorensen 40dd867198
Apple base418 storagepooltype as class (#351)
* StoragePoolType as a class

* Fix agent side StoragePoolType enum to class

* Handle StoragePoolType for StoragePoolJoinVO

* Since StoragePoolType is a class, it cannot be converted by @Enumerated annotation.
Implemented conveter class and logic to utilize @Convert annotation.

* Fix UserVMJoinVO for StoragePoolType

* fixed missing imports

* Since StoragePoolType is a class, it cannot be converted by @Enumerated annotation.
Implemented conveter class and logic to utilize @Convert annotation.

* Fixed equals for the enum.

* removed not needed try/catch for prepareAttribute

* Added license to the file.

* Implemented "supportsPhysicalDiskCopy" for storage adaptor. (#352)

Co-authored-by: mprokopchuk <mprokopchuk@apple.com>

* Add javadoc to StoragePoolType class

* Add unit test for StoragePoolType comparisons

* StoragePoolType "==" and ".equals()" fix.

* Fix for abstract storage adaptor set up issue

* review comments

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: mprokopchuk <mprokopchuk@apple.com>
Co-authored-by: mprokopchuk <mprokopchuk@gmail.com>
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2024-01-25 14:58:44 +05:30
slavkap a768c96a6d Create snapshot from VM snapshot without memory for NFS/Local storage (#8117)
(cherry picked from commit 6ae3b73ca2)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-10-26 12:18:56 +05:30
Daniel Augusto Veronezi Salvador 5b7837eae1 Fix: Convert volume to another directory instead of copying it while taking volume snapshots on KVM (#8041)
(cherry picked from commit 9b8eaeea78)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-10-09 19:12:40 +05:30
Marcus Sorensen c01ed90569 Trigger out of band VM state update via libvirt event when VM stops (#7963)
* Trigger out of band VM state update via libvirt event when VM stops

* Add License headers, refactor nested try

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
(cherry picked from commit 3694667f50)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-09-28 12:22:15 +05:30
Rohit Yadav e77107819b kvm: skip KVMHostInfoTest when build OS isn't Linux
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-09-27 17:47:10 +05:30
Marcus Sorensen 33a5c159a3 KVM Agent config to reserve dom0 CPUs (#326)
Cherry-picked from 1a722edfce6234319c5eaecf3cd0076fa9690267

Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-09-27 17:15:40 +05:30
Marcus Sorensen 7ed872c1d6 Use direct download timeout configs for URL check (#316)
(cherry picked from commit 289fe4bd5e8fad79fc416a26c3345ff93efb466a)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-09-27 17:06:20 +05:30
Marcus Sorensen a571cde7a9 Increase reserve on ScaleIO disk formatting for fragmentation (#317)
Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
(cherry picked from commit 59217e3fe19fc01863d678a34a618914afaf901b)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-09-27 15:43:14 +05:30
Nicolas Vazquez 20952b4842 Auto Enable/Disable KVM hosts (#7170)
* Auto Enable Disable KVM hosts

* Improve health check result

* Fix corner cases

* Script path refactor

* Fix sonar cloud reports

* Fix last code smells

* Add marvin tests

* Fix new line on agent.properties to prevent host add failures

* Send alert on auto-enable-disable and add annotations when the setting is enabled

* Address reviews

* Add a reason for enabling or disabling a host when the automatic feature is enabled

* Fix comment on the marvin test description

* Fix for disabling the feature if the admin has manually updated the host resource state before any health check result

(cherry picked from commit be66eb2a35)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-09-15 13:15:59 +05:30
Wei Zhou 126dd5fa4c
kvm: fix live vm migration between local storage pools (#7945) 2023-09-07 08:22:37 +05:30
Nicolas Vazquez 57c61fb33c
Fix direct download https compressed qcow2 template checker (#7932)
This PR fixes an issue on direct download while registering HTTPS compressed files
Fixes: #7929
2023-09-01 08:16:03 +02:00
fermosan fa58f59619
kvm: Added VNI Devices as normal bridge slave devs (#7836)
This will allow for VXLAN configurations to utilize tags on the physical network of a zone
2023-08-10 08:59:28 +02:00
Rohit Yadav dc5e4f3ec6
Allow KVM overcommit to work without reducing minimum VM memory when vm ballooning is disabled (#7810)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
Co-authored-by: Daan Hoogland <daan@onecht.net>
2023-08-06 10:39:14 +02:00
dahn d127d7939d
KVM: fix SSVM starting when overprovisioning memory (#7663) 2023-07-28 11:23:30 +02:00
Marcus Sorensen 63216425d5
Set encrypted PowerFlex disk format correctly (#7735)
Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-07-24 13:13:46 +05:30
dahn 0aade286f5
proper storage construction (#6797) 2023-07-19 10:27:20 +02:00
dahn 2752c49fa7
agent: get the right controll cidr (#7580)
Fixes: #7574
2023-07-07 22:57:58 +05:30
Vishesh 594c70dde0
Sync precommit config from main (#7732)
Co-authored-by: John Bampton <jbampton@users.noreply.github.com>
Co-authored-by: dahn <daan@onecht.net>
2023-07-07 11:18:16 +02:00
Nicolas Vazquez c733a23c90
Fix direct download URL checks (#7693)
This PR fixes the URL check for direct downloads, in the case of HTTPS URLs the certificates were not loaded into the SSL context
2023-07-06 13:47:13 +05:30
Harikrishna 40cc10a73d
Allow volume migrations in ScaleIO within and across ScaleIO storage clusters (#7408)
* Live storage migration of volume in scaleIO within same storage scaleio cluster

* Added migrate command

* Recent changes of migration across clusters

* Fixed uuid

* recent changes

* Pivot changes

* working blockcopy api in libvirt

* Checking block copy status

* Formatting code

* Fixed failures

* code refactoring and some changes

* Removed unused methods

* removed unused imports

* Unit tests to check if volume belongs to same or different storage scaleio cluster

* Unit tests for volume livemigration in ScaleIOPrimaryDataStoreDriver

* Fixed offline volume migration case and allowed encrypted volume migration

* Added more integration tests

* Support for migration of encrypted volumes across different scaleio clusters

* Fix UI notifications for migrate volume

* Data volume offline migration: save encryption details to destination volume entry

* Offline storage migration for scaleio encrypted volumes

* Allow multiple Volumes to be migrated with migrateVirtualMachineWithVolume API

* Removed unused unittests

* Removed duplicate keys in migrate volume vue file

* Fix Unit tests

* Add volume secrets if does not exists during volume migrations. secrets are getting cleared on package upgrades.

* Fix secret UUID for encrypted volume migration

* Added a null check for secret before removing

* Added more unit tests

* Fixed passphrase check

* Add image options to the encypted volume conversion
2023-06-21 11:57:05 +05:30
Marcus Sorensen ec0f8bddf6
Support local storage live migration for direct download templates (#7453)
Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-05-04 17:37:58 -03:00
João Jandre 523ab58d02
Fix PR 7131 bugs and vulnerabilities (#7140) 2023-03-21 15:06:18 +01:00
Wei Zhou 8592de95fa
Move PassphraseVO to use String instead of byte[] to support Encrypt annotation (#7302)
Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-03-03 13:08:17 +01:00
Abhishek Kumar 1a03f69a3a
cleanup: remove testing logs (#7270)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2023-02-21 14:42:33 +01:00
Wei Zhou 8ef35466de
Tungsten: fix functional issues (#7173)
Co-authored-by: dahn <daan.hoogland@gmail.com>
2023-02-13 09:15:28 +01:00
David Jumani c774b865c9
Tungsten integration (#7065)
Co-authored-by: rtodirica <rtodirica@ena.com>
Co-authored-by: Huy Le <huylm@unitech.vn>
Co-authored-by: radu-todirica <Radu.Todirica@ness.com>
Co-authored-by: Huy Le <minh.le@ext.ewerk.com>
Co-authored-by: Simon Weller <siweller77@gmail.com>
Co-authored-by: dahn <daan@onecht.net>
2023-02-01 09:19:53 +01:00
Abhishek Kumar 028ca74fb6
ui,server,api: resource metrics improvements (#6803)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-01-30 09:48:03 +01:00
slavkap d288bb0c78
KVM support of iothreads and IO driver policy (#6909) 2023-01-25 12:34:05 +01:00
Wei Zhou 743ebe7278
kvm: get vm disk stats for ceph disks (#7045) 2023-01-16 14:19:14 +01:00
Rohit Yadav 55d2d26449
kvm: make UEFI host check to support both Ubuntu and EL (#7084)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-01-16 14:12:53 +01:00
Stephan Krug 4d76054377
Fix volume snapshot in VM with attached ISO (#7037)
Co-authored-by: Stephan Krug <stephan.krug@scclouds.com.br>
2023-01-04 09:24:34 +01:00
Rohit Yadav fab4fc2a14 Merge remote-tracking branch 'origin/4.17' 2022-12-22 22:55:15 +05:30
Paula Oliveira 0fe2e6950e
Improving code related to the Agent properties (#6348)
Co-authored-by: Paula Zomignani Oliveira <paula@scclouds.com.br>
Co-authored-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2022-12-22 12:00:49 +01:00
Abhishek Kumar fb22c5c3c9
kvm: correctly set vm cpu topology (#6870)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2022-12-19 11:01:10 +01:00