Commit Graph

4321 Commits

Author SHA1 Message Date
Daniel Augusto Veronezi Salvador 84538da27c Updating pom.xml version numbers for release 4.18.2.5
Signed-off-by: Daniel Augusto Veronezi Salvador <gutoveronezi@apache.org>
2024-11-05 00:54:55 -03:00
Daniel Augusto Veronezi Salvador 966b75d0b9 Verify QCOW2 features on direct download of template 2024-11-05 00:26:19 -03:00
Daan Hoogland 54b3519df1 Updating pom.xml version numbers for release 4.18.2.4
Signed-off-by: Daan Hoogland <daan@onecht.net>
2024-10-03 17:36:32 +02:00
Wei Zhou 124d6b8b81 util: check JSESSIONID in cookies if user is passed 2024-10-03 17:35:54 +02:00
Daan Hoogland b97bd3bee1 fix quota resource access validation 2024-10-03 15:33:41 +02:00
nvazquez be191f5ad7
Updating pom.xml version numbers for release 4.18.2.3
Signed-off-by: nvazquez <nicovazquez90@gmail.com>
2024-08-02 17:24:50 -03:00
Abhishek Kumar 22baf2494d Updating pom.xml version numbers for release 4.18.2.2
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-07-15 17:37:07 +05:30
Abhishek Kumar f0faa4a6b3 saml: signature check improvements
Adminstrators should ensure that IDP configuration has a signing certificate for the actual signature check to be performed. In addition to this, this change introduces a new global setting saml2.check.signature, with the default value of true, which can deliberately fail a SAML login attempt when the SAML response has a missing signature.
Purges the SAML token upon handling the first SAML response.

Authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-07-15 17:35:07 +05:30
Abhishek Kumar ef5b5bbd4e Updating pom.xml version numbers for release 4.18.2.1
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-07-04 16:16:56 +05:30
Abhishek Kumar 4f5561937c framework/cluster: improve cluster service and integration API service
- mTLS implementation for cluster service communication
- Listen only on the specified cluster node IP address instead of all interfaces
- Validate incoming cluster service requests are from peer management servers based on the server's certificate dns name which can be through global config - ca.framework.cert.management.custom.san
- Hardening of KVM command wrapper script execution
- Improve API server integration port check
- cloudstack-management.default: don't have JMX configuration if not needed. JMX is used for instrumentation; users who need to use it should enable it explicitly

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Wei Zhou <weizhou@apache.org>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-07-04 16:08:18 +05:30
João Jandre 154566f914 Updating pom.xml version numbers for release 4.18.2.0
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-04-12 08:25:04 -03:00
Rene Peinthor 6cd5c6a1d0
linstor: Do not pretend handling disconnect paths that are non Linstor (#8897) 2024-04-12 08:23:15 -03:00
Abhishek Kumar ff3e9bd821 engine-storage: control download redirection
Add a global setting to control whether redirection is allowed while
downloading templates and volumes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-04-04 14:11:05 +05:30
Henrique Sato 223a9b8031
Quota tariff events (#8030)
Co-authored-by: Henrique Sato <henrique.sato@scclouds.com.br>
2024-03-06 17:33:39 +01:00
dahn 56e0450526
Logging improvements on migration in the VmwareResource (#8300) 2024-02-28 15:29:35 +05:30
Suresh Kumar Anaparti f731fe882c
Storage plugin support to check if volume on datastore requires access for migration (#8655)
* Check if volume on datastore requires access for migration, and grant/revoke volume access if requires

* Updated default implementation for requiresAccessForMigration method in PrimaryDataStoreDriver
2024-02-26 20:16:31 +05:30
Wei Zhou 18c3d470c6
CKS: fix /opt/bin/deploy-cloudstack-secret in CKS control nodes (#8697) 2024-02-26 14:21:26 +01:00
Wei Zhou 8d4b4dcec4
CKS: add kube config path in extra control nodes (#8658) 2024-02-16 15:01:27 +01:00
dahn 672206c312
kvm: ITCO watchdog added (#8282)
* ITCO watchdog added

* add inject-nmi action

* Update plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtVMDef.java

Co-authored-by: Wei Zhou <weizhou@apache.org>

---------

Co-authored-by: Wei Zhou <weizhou@apache.org>
2024-02-12 08:54:39 +01:00
Rene Peinthor 393f3d7727
linstor: use relative hostname path (#8633)
As described in issue #8310 some older distributions don't have
hostname in /usr/bin so rely on PATH resolving
2024-02-09 11:49:20 +01:00
Rene Peinthor 56f0448f0d
Linstor fix migration while node offline (#8610)
* linstor: Add util method getBestErrorMessage from main

* linstor: failed remove of allow-two-primaries is no fatal error

* linstor: Fix failure if a Linstor node is down while migrating

If a Linstor node is down while migrating resource, allow-two-primaries
setting will fail because we can't reach the downed node. But it will
still set the property on the other nodes and migration should work.
We now just report an error instead of completely failing.
2024-02-08 23:57:38 +05:30
Wei Zhou 69e8ebc03f
CKS: retry if unable to drain node or unable to upgrade k8s node (#8402)
* CKS: retry if unable to drain node or unable to upgrade k8s node

I tried CKS upgrade 16 times, 11 of 16 upgrades succeeded.

2 of 16 upgrades failed due to
```
error: unable to drain node "testcluster-of7974-node-18c8c33c2c3" due to error:[error when evicting pods/"cloud-controller-manager-5b8fc87665-5nwlh" -n "kube-system": Post "https://10.0.66.18:6443/api/v1/namespaces/kube-system/pods/cloud-controller-manager-5b8fc87665-5nwlh/eviction": unexpected EOF, error when evicting pods/"coredns-5d78c9869d-h5nkz" -n "kube-system": Post "https://10.0.66.18:6443/api/v1/namespaces/kube-system/pods/coredns-5d78c9869d-h5nkz/eviction": unexpected EOF], continuing command...
```

3 of 16 upgrades failed due to
```
Error from server: error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=roles", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=Role"
Name: "kubernetes-dashboard", Namespace: "kubernetes-dashboard"
from server for: "/mnt/k8sdisk//dashboard.yaml": etcdserver: leader changed
```

* CKS: remove tests of creating/deleting HA clusters as they are covered by the upgrade test

* Update PR 8402 as suggested

* test: remove CKS cluster if fail to create or verify
2024-02-06 11:14:10 +01:00
Lucas Martins 1c98b5a4e5 Change Cryptsetup validation (#8482)
Co-authored-by: lucas.martins.scclouds <lucas.martins@scclouds.com.br>
2024-02-01 09:43:28 +01:00
Wei Zhou b34f093137
veeam: fix some issues with restoring volume from backup and attaching it to VM (#8570)
* veeam: detach only the restored volume during backup restore

Steps to reproduce the issue
1. create a VM (A) with ROOT and DATA disk
2. assign to a backup offering
3. create backup
4. create another VM (B)
5. restore the DATA disk of VM A, and attach to VM B
6. When operation is done, check the datastore

Without this change, the ROOT image is not removed and left over on the datastore.
```
[root@ref-trl-5933-v-Mr8-wei-zhou-esxi2:/vmfs/volumes/5f60667d-18d828eb] ls -l /vmfs/volumes/5f60667d-18d828eb/CS-RSTR-dfb6f21c-a941-49db-9963-4f0286a17dac
total 1784840
-rw-------    1 root     root     5242880000 Jan 24 09:23 ROOT-722_2-flat.vmdk
-rw-------    1 root     root           499 Jan 24 09:23 ROOT-722_2.vmdk
```

With this change, the whole temporary vm has been destroyed.
```
[root@ref-trl-5933-v-Mr8-wei-zhou-esxi2:/vmfs/volumes/5f60667d-18d828eb] ls -l /vmfs/volumes/5f60667d-18d828eb/CS-RSTR-734bee3b-640c-4ff0-a34b-bc45358565b2
ls: /vmfs/volumes/5f60667d-18d828eb/CS-RSTR-734bee3b-640c-4ff0-a34b-bc45358565b2: No such file or directory
```

* veeam: fix wrong disk size in debug message

* veeam: sync backup repository after operations are done

got exception of some operations which succeeds due to the following error
```
2024-01-19 10:59:52,846 DEBUG [o.a.c.b.v.VeeamClient] (API-Job-Executor-42:ctx-716501bb job-4373 ctx-2359b76d) (logid:b5e19a17) Veeam response for PowerShell commands [PowerShell Import-Module Veeam.Backup.PowerShell -WarningAction SilentlyContinue;$restorePoint = Get-VBRRestorePoint ^| Where-Object { $_.Id -eq '1d99106a-b5c8-4a1e-958d-066a987caa5f' };if ($restorePoint) { Remove-VBRRestorePoint -Oib $restorePoint -Confirm:$false;$repo = Get-VBRBackupRepository;Sync-VBRBackupRepository -Repository $repo;} else { ; Write-Output 'Failed to delete'; Exit 1;}] is: [^M
Restore Type       Job Name             State      Start Time             End Time               Description           ^M
------------       --------             -----      ----------             --------               -----------           ^M
ConfResynchronize  Configuration Dat... Starting   19/01/2024 10:59:52    01/01/1900 00:00:00                          ^M
^M
^M
Remove-VBRRestorePoint : Win32 internal error "Access is denied" 0x5 occurred while reading the console output buffer. ^M
Contact Microsoft Customer Support Services.^M
At line:1 char:196^M
+ ... orePoint) { Remove-VBRRestorePoint -Oib $restorePoint -Confirm:$false ...^M
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^M
    + CategoryInfo          : ReadError: (:) [Remove-VBRRestorePoint], HostException^M
    + FullyQualifiedErrorId : ReadConsoleOutput,Veeam.Backup.PowerShell.Cmdlets.RemoveVBRRestorePoint^M
 ^M
].
```

* veeam: fix unable to detach volume when restore backup and attach to vm then detach the volume

It also happened when destroy the original or backup VM

```
2024-01-24 10:10:03,401 ERROR [c.c.s.r.VmwareStorageProcessor] (DirectAgent-74:ctx-95b24ac7 10.0.35.53, job-25995/job-25996, cmd: DettachCommand) (logid:7260ffb8) Failed to detach volume!
java.lang.RuntimeException: Unable to access file [de52fdd3386b3d67b27b3960ecdb08f4] i-2-723-VM/7c2197c129464035bab062edec536a09-flat.vmdk
        at com.cloud.hypervisor.vmware.util.VmwareClient.waitForTask(VmwareClient.java:426)
        at com.cloud.hypervisor.vmware.mo.DatastoreMO.moveDatastoreFile(DatastoreMO.java:290)
        at com.cloud.storage.resource.VmwareStorageLayoutHelper.syncVolumeToRootFolder(VmwareStorageLayoutHelper.java:241)
        at com.cloud.storage.resource.VmwareStorageProcessor.attachVolume(VmwareStorageProcessor.java:2150)
        at com.cloud.storage.resource.VmwareStorageProcessor.dettachVolume(VmwareStorageProcessor.java:2408)
        at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:174)
        at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:71)
        at com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:589)
        at com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2024-01-24 10:10:03,402 INFO  [c.c.h.v.u.VmwareHelper] (DirectAgent-74:ctx-95b24ac7 10.0.35.53, job-25995/job-25996, cmd: DettachCommand) (logid:7260ffb8) [ignored]failed to get message for exception: Unable to access file [de52fdd3386b3d67b27b3960ecdb08f4] i-2-723-VM/7c2197c129464035bab062edec536a09-flat.vmdk
```

* vmware: create restored volume with new UUID and attach to VM
2024-01-29 11:40:43 +01:00
Wei Zhou 33bb92acce
Veeam: Support Veeam 11 and 12 (#8241)
This PR fixes several issues in the testing of Veeam 11 and Veeam12
- Import Veeam.Backup.PowerShell and silently ignore the warning messages
- Fix issue when assign vm to backup offerings, which caused by separator (\r\n)
- Fix authorization failure in veeam 12a, which is because v1_4 is not supported in veeam 12a any more
- Fix exception if backup name has space
- Fix backup metrics in veeam12, which is because powershell command does not return the values needed
- Fix Incorrect datetime value, which is because powershell command returns a datetime which is not supported in Java
- Fix issue during backup restoration if VM has both ROOT and DATA disks.

This PR also has the following update
- Add integration test test/integration/smoke/test_backup_recovery_veeam.py
- Make some UI changes
- Add zone setting backup.plugin.veeam.version. If it is not set, CloudStack will get veeam version via powershell commands.
- Add zone setting backup.plugin.veeam.task.poll.interval and backup.plugin.veeam.task.poll.max.retry
2024-01-19 18:42:01 +01:00
Wei Zhou ab70108f15
CKS: create Security Groups for CKS clusters of each account (#8316)
This PR fixes #7684

The security groups contain the same rules for port 22 and 6443, no need to recreate for each CKS cluster.
2023-12-20 08:57:27 +05:30
Bryan Lima 3bb318bab9
kvm: Add support for cgroupv2 (#8252)
1. Problem description

In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation.

    Equation 1
    shares = CPU * speed

Fixes: #6744
2. Proposed changes

To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.

The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS.

    Equation 2
    shares = (VM requested shares * cgroup upper limit)/host max shares

To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host.

To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency.

It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.
2023-12-13 10:51:24 +05:30
Harikrishna 3ce7c39bef
cks: handle errors while scaling cluster (#8107)
This PR fixes the issue #7920
2023-12-12 16:57:28 +05:30
Wei Zhou fc44df7c95
CKS: create HA cluster with 3 control VMs instead 2 (#8297)
This PR fixes the test failures with CKS HA-cluster upgrade.
In production, the CKS HA cluster should have at least 3 control VMs as well.
The etcd cluster requires 3 members to achieve reliable HA. The etcd daemon in control VMs uses RAFT protocol to determine the roles of nodes. During upgrade of CKS with HA, the etcd become unreliable if there are only 2 control VMs.
2023-12-09 11:33:05 +05:30
Peinthor Rene bba554bcc4
linstor: Fix possible NPE if Linstor storage-pool data missing (#8319)
If Linstor doesn't return storage pool info, certain values are null.
Now we assume the values are 0 if we get null values.
2023-12-08 17:02:18 +05:30
Wei Zhou 7ea068c4dc
kvm: fix error 'Failed to find passphrase for keystore: cloud.jks' when enable SSL for kvm agent (#7923) 2023-12-07 09:10:11 +01:00
Wei Zhou db6dd52f44
kvm: fix ide controller for rocky/alma vms (#8247) 2023-12-06 15:05:49 +01:00
dahn 1a2dbebe48
Let Prometheus exporter plugin support utf8 characters (#8228) 2023-11-15 09:48:11 +01:00
Abhishek Kumar d0f3233fda
edge-zone,kvm,iso,cks: allow k8s deployment with direct-download iso (#8142)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2023-11-10 13:56:05 +01:00
Harikrishna 1e133d05c7
kvm: Handle the failures when setting up memory balloon stats period for KVM VMs (#8049) 2023-11-03 09:07:11 +01:00
slavkap 6ae3b73ca2
Create snapshot from VM snapshot without memory for NFS/Local storage (#8117) 2023-10-26 08:46:14 +02:00
Peinthor Rene 67cb9b9e40
linstor: fix template copy on non hyperconverged setups (#8114)
Making a diskful resource was meant as an optimization,
but cannot work on non hyperconverged setups,
as the storage nodes (diskful) are not part of the cloudstack cluster.
2023-10-19 10:46:20 +05:30
Harikrishna 76ab621a5a
Fix UUID for child datastores in all cases (#8057) 2023-10-18 13:00:55 +05:30
Peinthor Rene 4a86a0d233
linstor: Fix template volume missing on copy node (#8082)
A TODO was overseen and never implemented,
which could trigger the following bug:

If Linstor didn't create a resource (diskless or diskfull) on
the cloudstack choosen node, it would not be able to copy the
template data there, it even seems no error was
triggered and the new template file silently just became
empty/corrupt.
2023-10-17 17:05:42 +05:30
Ben a20ab40b67
Ensure getCapacityState() is not called for hosts in maintenance (#8025) 2023-10-06 09:49:57 +02:00
Daniel Augusto Veronezi Salvador 9b8eaeea78
Fix: Convert volume to another directory instead of copying it while taking volume snapshots on KVM (#8041) 2023-10-06 09:47:34 +02:00
Peinthor Rene 96205a51ef
linstor: resize root disk on offerings with different size (#7952) 2023-10-02 15:58:00 +02:00
Marcus Sorensen 221f863939
Use direct download timeout configs for URL check (#7948)
Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-09-28 12:11:38 +05:30
dahn 09ae0499b2
ldap trust map cleanup on domain delete (#7915)
Co-authored-by: Wei Zhou <weizhou@apache.org>
2023-09-19 08:01:15 +02:00
Marcus Sorensen f049d4d189
Increase reserve on ScaleIO disk formatting for fragmentation (#7955)
Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-09-14 16:43:16 +05:30
Wei Zhou 246bb24b0f Updating pom.xml version numbers for release 4.18.2.0-SNAPSHOT
Signed-off-by: Wei Zhou <weizhou@apache.org>
2023-09-12 17:26:53 +02:00
Wei Zhou 4bdff06acd Updating pom.xml version numbers for release 4.18.1.0
Signed-off-by: Wei Zhou <weizhou@apache.org>
2023-09-07 08:50:50 +02:00
Abhishek Kumar f049f5409e
server: fix dualstack ipv6 networks for vxlan (#7933)
Fixes #7926

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2023-09-07 08:46:45 +02:00
Wei Zhou 126dd5fa4c
kvm: fix live vm migration between local storage pools (#7945) 2023-09-07 08:22:37 +05:30
Nicolas Vazquez 57c61fb33c
Fix direct download https compressed qcow2 template checker (#7932)
This PR fixes an issue on direct download while registering HTTPS compressed files
Fixes: #7929
2023-09-01 08:16:03 +02:00