Commit Graph

32454 Commits

Author SHA1 Message Date
nvazquez ac5085f641
Fix index name on vmsnapshot 2021-12-18 01:00:14 -03:00
nvazquez e9b42a3fb2
Add java changes 2021-12-17 11:24:58 -03:00
nvazquez f7b40961af
Add SQL changes on the upgrade schema 2021-12-17 10:57:29 -03:00
Marcus Sorensen 490429f2e2
kvm: Randomize managed volume copy host (#121)
* Managed volume copy was always returning first host that could see storage pools

* Fix NPE in logging for ScaleIOPrimaryDataStoreDriver

Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
2021-11-18 13:21:16 +05:30
sureshanaparti 77fc3be9db
ScaleIO/PowerFlex volume migration improvements (#119)
* Report the PowerFlex/ScaleIO disk copy failure during volume migration and fail the migration.

* Add the source format to conversion cmd 'qemu-img convert' when specified explicitly.

* Ignore the file encryption for now, to specify source format
2021-10-08 12:50:54 +05:30
Rohit Yadav 1d14552590
kvm: Fixed removal of hosts from certsmap when running certificate auto-renewal (#4156) (#117)
When a host connects to a management server, the host IP address and the certificate are stored in memory on the management server. This mapping is checked periodically to determine if any certificates are due to expire.

Before a certificate is renewed, a few checks are done to determine if the host is connected to the management server by fetching the host record from the database. The problem here is if the wrong record is fetched, the host is not checked for renewal.

This PR improves the host record fetch from the database by looking only at hosts that are not removed.

Fixes: #4129
(cherry picked from commit 7b881517b7)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Co-authored-by: Spaceman1984 <49917670+Spaceman1984@users.noreply.github.com>
2021-10-08 12:48:31 +05:30
sureshanaparti e91843af59
Blocked the attach volume operation for uploaded volume on ScaleIO/PowerFlex storage pool, and updated unit tests. (#116) 2021-06-11 13:02:48 +05:30
sureshanaparti c884b5592b
powerflex: Gateway client improvements (#114)
This PR updates the PowerFlex/ScaleIO gateway client with the following improvements.

- Added connection manager to the gateway client.
- Renew the client session on '401 Unauthorized' response.
- Refactored the gateway client calls, for GET and POST methods.
- Consume the http entity content after login/(re)authentication and close the content stream if exists.
- Added storage pool client max connections configuration 'storage.pool.client.max.connections' (default: 100) to specify the maximum connections for the  ScaleIO storage pool client.
- Updated storage pool client connection timeout configuration 'storage.pool.client.timeout' to non-dynamic (It's value is picked by the ScaleIO client when it is created, which is either on the primary storage addition or management server (re)start for existing pools. In order to apply this config value for existing pools, it is required to restart the management server. Changing this config to 'static' prompts the operator, to restart the management server to apply the new value).
2021-06-02 15:32:49 +05:30
sureshanaparti 66b7df6650
Clean up the dest volume when it's creation failed on migrate volume operation (#113) 2021-02-26 10:50:30 +05:30
sureshanaparti 6b8feb2b09
Fix for "Reboot skips network plugin" and Added addl functionality (details below) support for PowerFlex/ScaleIO volume operations (#112)
* Updated libvirt's native reboot operation for VM on KVM using ACPI event, and Added 'forced' reboot option to stop and start the VMs

- Added 'forced' reboot option for User VM (New parameter 'forced' in rebootVirtualMachine API, to stop and start User VM)
- Added 'forced' reboot option for System VM (New parameter 'forced' in rebootSystemVm API, to stop and then start System VM)
- Added 'forced' reboot option for Router (New parameter 'forced' in rebootRouter API, to force stop and then start Router)
- Added force reboot tests for User VM, System VM and Router

* Updated the PowerFlex/ScaleIO volume operations support in CloudStack. Added support for the folllowing:

- PowerFlex volume migration (with snapshots) within the same PowerFlex storage clusters, using native V-Tree migration.
- PowerFlex volume migration (without snapshots) across different PowerFlex storage clusters.
    => findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
    => Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
    => Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
    => Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.

- Template creation (on secondary storage) from PowerFelx/ScaleIO volume or snapshot.
- Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)

Other Changes:
- Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
- Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.

* Provision to add PowerFlex/ScaleIO storage pool as Primary Storage from UI

* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
2021-02-19 12:54:13 +05:30
sureshanaparti 56f2c2643a
PowerFlex/ScaleIO bugfix Jan Patch (#110)
* Addressed some issues for the operations on PowerFlex storage pool.

- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.

- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).

- Updated resize volume error message string.

- Blocked the below operations on PowerFlex storage pool:
  -> Extract Volume
  -> Create Snapshot for VMSnapshot

* Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)

* Use single gateway client per Powerflex/ScaleIO pool and renew it when the session token expires.

The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html

* Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.

* Perform basic tests (for connectivity and file system) on router before updating the health check config data

Updated changes to:
- Validate the basic tests (connectivity and file system check) on router
- Cleanup the health check results when router is destroyed
2021-01-12 12:42:48 +05:30
sureshanaparti ee4384e8a3
Reallocate new PowerFlex/ScaleIO ROOT volume for the VM when template to managed volume copy fails (#105) 2020-11-03 14:24:43 +05:30
sureshanaparti 7d5a4cde7c
Updated DB encryption for ScaleIO credentials and Added lock while spooling managed storage template (#103)
* Encrypt the ScaleIO storage pool credentials in the DB

* Added sync lock while spooling managed storage template
2020-10-21 17:50:33 +05:30
Suresh Kumar Anaparti a61ba8c755 Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s) 2020-10-12 22:03:53 +05:30
Suresh Kumar Anaparti 73a6c53907 [UI] Delete the template with the specified zoneid, irrespective of the cross-zones template
- Note: Post deletion, if the template doesn't exists in other zone's secondary storage, it will be marked as Inactive
2020-10-09 17:19:21 +05:30
Rohit Yadav 36166046cf
ScaleIO: Storage Plugin (Phase 0+1) (#77)
* scaleio: prototype storage plugin

- plugin skeleton
- add storage pool, create/attach data disk

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* kvm: attach disk example

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* Updated ScaleIO storage plugin to support Volume operations

* ScaleIO storage plugin - Support for VM operations and other updates

* ScaleIO storage pool plugin changes

- Added validation to check existing ScaleIO storage pool and update capacity details
- Updated resize volume for ScaleIO to pick the rounded 8GB boundary size
- Added support for setting ScaleIO storage pool statistics (bandwidthLimitInKbps, iopsLimit)

* Fixed IOPS validation and volume size update when resizing ScaleIO volume

* Removed connect/disconnect disk changes from ScaleIO storage adaptor
- ScaleIO datastore driver does map/unmap ScaleIO volume (from MS) using grant/revoke access
- Not required to map/unmap ScaleIO volume from the storage adaptor

* Updated connect disk, to wait for ScaleIO volume to become available in the KVM host

* Updated ScaleIO storage provider, pool type, url scheme and related paramters to the new "PowerFlex" brand

* Fixed size rounding issue while creating PowerFlex volume and added validations to PowerFlex Gateway API client

* Updated host sdc connection check for ScaleIO/PowerFlex pool on host connect

* Updated volume snapshots support for volumes on ScaleIO/PowerFlex storage pool and Added some validations for ScaleIO disks in host

* Added primary storage level configurable setting "storage.pool.disk.wait" to wait for disk availability

- Confiure the disk availability wait time, mainly introduced for ScaleIO/PowerFlex storage pool (can be used for other managed storages), to wait for the disk to become available in the host before performing any operation on it

* Enabled template spooling to ScaleIO/PowerFlex storage pool and create VM from the spooled template.
Added ScaleIO SDC limits support for volumes using offering parameters: bandwidthLimitInKbps, iopsLimit.

* Added support for VM snapshots on ScaleIO/PowerFlex storage pool
Minor improvements for IOPS (SDC Limits) configuration

* Updated access for ScaleIO/PowerFlex volumes on VM Start and Stop
Added primary storage level configurable setting "storage.pool.client.timeout" for storage API client
Enabled cluster wide storage pool support for ScaleIO/PowerFlex storage
Minor improvements for ScaleIO/PowerFlex disk access in the KVM host

* Added support for direct download of templates (raw, qcow2) on ScaleIO/PowerFlex storage pool

* Added support for config drives in host cache for KVM

- Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
- Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
- Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
- Added new parameter "vm.configdrive.host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path for config drives

* Updated disk access while migrating the VM with volumes on ScaleIO/PowerFlex storage pool
Changed the parameter "vm.configdrive.host.cache.location" to "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties to specify the host cache path
Changes to create config drives on the "/config" directory on the host cache path
Changes to suppport migrate VM with config drive on the host cache path

* Additonal changes to support migrate VM with config drive on the host cache

* Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
Updated full deployment destination for preparing the network(s) on VM start

* Propagate the direct download certificates uploaded to the newly added KVM hosts

* Code improvements for ScaleIO/PowerFlex storage plugin

* Updated storage stats collection and tests for ScaleIO/PowerFlex storage plugin

* Fix for template size of direct download templates on capacity check for ScaleIO/PowerFlex storage pool
Updated data object grant and revoke access for connected SDCs to ScaleIO/PowerFlex storage pool

* Discover the template size for direct download templates using any available host from the zones specified on template registration

When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones

* Maintain the config drive location and use it when required on any config drive operation (migrate, delete)

* Ensure the volume to be expunged, is expunge ready on storage cleanup

* Do not set the storage migration flag for the volumes on zone wide PowerFlex/ScaleIO pool when listing the hosts available for cross-cluster migration

* Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)

* Added alerts for PowerFlex/ScaleIO SDC disconnection on the host(s)

* Retry VM deployment/start when the host cannot access volume/template on the ScaleIO/PowerFlex storage

* Changes to find a potential host that can access the ScaleIO/PowerFlex storage pool

* Updated ScaleIO/PowerFlex storage pool stats for checking the available capacity and usage

* Updated ScaleIO/PowerFlex volumes naming convention to avoid the naming conflicts on sharing

* Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand

- Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore

* Updated ScaleIO/PowerFlex storage pool capacity stats

* Cleanup unused templates and host entries on PowerFlex/ScaleIO storage pool deletion

* Check the router filesystem is writable or not, before performing health checks

- Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
- The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.

* Updated the router filesystem writable check using script, instead cmd execution

- Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not

* Update volume stats (physical and virtual size) for the volumes on PowerFlex/ScaleIO storage pool

Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
2020-10-07 16:02:02 +05:30
Rohit Yadav f7808cfb52 server: fix TransactionLegacy DB connection leaks due to DB switching by B&R thread (#4121)
BackupSync task would switch between databases to update backup usage
metrics in the cloud_usage.usage_backup table. The current framework
and the usage in ManagedContext causes database connection
(LegacyTransaction) leaks. When the thread runs faster, the issue is
easily reproducible and checking via heap dump analysis or using JMX
MBeans. This fixes by moving the task of backup data updation for
usage data to the usage server by publishing usage events instead of
switching between databases in a local thread while in a
ManagedContextRunnable.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit b54d19b3b9)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-06-18 14:06:24 +05:30
Spaceman1984 1ca3117fd4 storage: Fixed null pointer (#4130)
Fixes #4090

When trying to migrate a VM across 2 clusters, if a snapshot has been deleted and garbage collection has run to update the removed field, it is not possible to migrate the instance due to a null pointer.

(cherry picked from commit 6a683dcf77)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-06-18 14:06:15 +05:30
Spaceman1984 8eae6d4c93 kvm: Fixed HA migrated storage error (#4079)
Fixes #4045

(cherry picked from commit fef4458830)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-06-18 14:05:58 +05:30
Spaceman1984 0503901d82 kvm: sending std output to dev/null to prevent garbage output (#4123)
When scripts/vm/hypervisor/kvm/kvmvmactivity.sh is called with an incorrect file name, an error is printed which is then interpreted as output from the script.

When an incorrect file name is passed the script prints out:

stat: cannot stat ‘b51d7336-d964-44ee-be60-bf62783dabc’: No such file or directory
=====> DEAD <======
The KVMHAVMActivityChecker.java checkingHB() process is expecting just
=====> DEAD <======
but gets the unexpected error message and interprets the file as alive.

(cherry picked from commit 23fa647985)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-06-18 14:05:48 +05:30
NuxRo 5befcf7411 packaging: missing python3 libvirt dependency for CentOS7 (#4124)
Missing python3 libvirt bindings on CentOS7 effectively break security groups.
There are 0 firewall rules added. The agent logs report:

```2020-06-02 10:58:34,346 DEBUG [kvm.resource.LibvirtComputingResource] (main:null) (logid:) Traceback (most recent call last):  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 26, in <module>    import libvirtModuleNotFoundError: No module named 'libvirt'
```

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit db55910f6b)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-06-18 14:05:35 +05:30
andrijapanicsb 6f96b3b2b3 Updating pom.xml version numbers for release 4.14.0.0
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-05-11 15:03:14 +01:00
havengit 60d7215a06
fix dhcp lease entry wrong hostname (#4064)
When Guest VM add secondary nic,  will get wrong hostname "infiniteh" from dhcp server
infiniteh -->infinite
cat /etc/dhcphosts.txt
02:00:0b:ef:00:04,set:192_168_4_18,192.168.4.18,gumd-tes3,infiniteh
2020-05-11 10:56:14 +02:00
Daan Hoogland 8173741742 Merge branch '4.13' 2020-05-06 14:46:16 +00:00
Gabriel Beims Bräscher 74cf326d3b
Allow deleting snapshot on local filesystem (#4057) 2020-05-06 16:38:18 +02:00
Abhishek Kumar 09697fe112
cks: use public links for templates and binaries iso for smoke tests (#3992)
* changed template and binaries iso to public links

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* iso state check and timeout fixes

refactoring

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* changed timeouts

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2020-05-06 11:36:04 +02:00
Rohit Yadav 381039a58f
db.properties: Enforce UTC timezone by default (#4055)
* db.properties: Enforce UTC timezone by default

This would give users ability to change the timezone

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* fix server time to UTC

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* Update the db.usage.url.params=serverTimezone=UTC per Liridon's testing

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-05-06 10:49:50 +02:00
andrijapanicsb 398e685e01 Updating pom.xml version numbers for release 4.13.2.0-SNAPSHOT
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-04-29 12:29:12 +01:00
Daan Hoogland 689e529d7b Merge release branch 4.13 to master
* 4.13:
  Fixed guest vlan range going missing when using zone wizzard (#4042)
  Volume migration (#4043)
2020-04-23 20:19:30 +02:00
andrijapanicsb b2ffa3efa5 Updating pom.xml version numbers for release 4.13.1.0
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-04-23 19:17:09 +01:00
Spaceman1984 7b7caf5559
Fixed guest vlan range going missing when using zone wizzard (#4042) 2020-04-23 19:57:43 +02:00
dahn c1570b9c91
Volume migration (#4043)
* Update AncientDataMotionStrategy.java

fix When secondary storage usage is> 90%, VOLUME migration across primary storage will cause the migration to fail and lose VOLUME

* Update AncientDataMotionStrategy.java

Volume is migrated across Primary storage. If no secondary storage is available(Or used capacity> 90% ), the migration is canceled.
Before modification, if secondary storage cannot be found, copyVolumeBetweenPools return NUll

copyAsync considers answer = null to be a sign of successful task execution, so it deletes the VOLUME on the old primary storage. This is the root cause of data loss, because VOLUME did not perform the migration at all.

* code in comment removed

Co-authored-by: div8cn <35140268+div8cn@users.noreply.github.com>
Co-authored-by: Daan Hoogland <dahn@onecht.net>
2020-04-23 19:56:27 +02:00
Daan Hoogland 8e4be6dc60 Merge branch '4.13' 2020-04-16 15:27:52 +02:00
Andrija Panic b406e1dc46
Bring back vm.suspend during deleting VM snapshot (#4029) 2020-04-16 15:15:22 +02:00
Wei Zhou 2637a86ac2
kvm: suspend/resume in deleting vm snapshot on kvm (#4033) 2020-04-16 15:14:47 +02:00
dahn 1d34eed43c Cs 1268 gs
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
2020-04-16 15:13:06 +02:00
dahn 22e0fc8752 mac-check 2020-04-16 15:10:50 +02:00
dahn 6a72e6e9f8 do not put in default accept rules for DNS and BOOTPS 2020-04-16 15:09:51 +02:00
Sina Kashipazha 208e185714
FIX: prevent empty sshkey name. (#4023)
* FIX: prevent empty sshkey name.

* Move sshKeyName check before database access.

Co-authored-by: Sina Kashipazha <s.kashipazha@global.leaseweb.com>
2020-04-14 16:19:24 +02:00
Daan Hoogland b984184b7a Merge release branch 4.13 to master
* 4.13:
  Snapshot deletion issues (#3969)
  server: Cannot list affinity group if there are hosts dedicated… (#4025)
  server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
2020-04-11 16:45:00 +02:00
dahn f18fe5e1da
Snapshot deletion issues (#3969)
* Fixes snapshot deletion

* Remove legacy '@Component', it is not necessary in this bean/class.

* Fix log message missing %d and remove snapshot on DB

* Remove "dummy" boolean return statement

* Manage snapshot deletion for KVM + NFS (primary storage)

* checkstyle trailing spaces

* rename options strings to *_OPTION

* Fix typo on deleteSnapshotOnSecondaryStorage and enhance log message

* Move the snapshotDao.remove(snapshotId); (#4006)

* Fix deletesnapshot worflow to handle both snapshots created in primary storage and snapshots backed up to secondary storage

* Fix extra space

* refactor out separate handling methods for secondary and primary (reducing returns)

* return false on unexpected error or log when expected

* != instead of ==

* secondary instead of backup storage

* init to null

* Handle snapshot deletion on primary storage. When primary store ref not found for snapshot do not fail the operation.

* Fix debug levels on log messages

Co-authored-by: GabrielBrascher <gabriel@apache.org>
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
Co-authored-by: Harikrishna Patnala <harikrishna.patnala@gmail.com>
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
2020-04-11 16:40:27 +02:00
Wei Zhou e0b67a4c68
server: Cannot list affinity group if there are hosts dedicated… (#4025) 2020-04-10 09:10:51 +02:00
Nicolas Vazquez 3d4b9afd62
Improvement on build time and new quality profile (#4014) 2020-04-07 10:54:41 +02:00
Wei Zhou 6bf92fb136
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002) 2020-04-06 22:01:40 +02:00
Nicolas Vazquez 0c4bd5346c
Remove rolling-maintenance service from debian rules (#3984) 2020-04-04 14:09:35 +02:00
Andrija Panic d52f3f4a6b
Update schema-41310to41400.sql (#3999)
* Update schema-41310to41400.sql

* update desc

* update the config key as well

* Update schema-41310to41400.sql (#4012)

* Update schema-41310to41400.sql

* update configkey desc
2020-04-04 14:07:14 +02:00
Nicolas Vazquez 22b4cca50d
Fix template registration error (#4008) 2020-04-03 20:37:00 +02:00
Rohit Yadav 5bb30f7ff3 Merge remote-tracking branch 'origin/4.13' 2020-04-02 20:47:37 +05:30
Wei Zhou 941cc4e2ee
Add support for zulu-11 (#3988)
Steps to install zulu-11 on Ubuntu 16.04:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 0xB1998361219BD9C9
sudo echo 'deb http://repos.azulsystems.com/ubuntu stable main' >/etc/apt/sources.list.d/azul.list
sudo apt update
sudo apt install zulu-11 -y
2020-04-01 18:39:24 +02:00
Spaceman1984 a651eaacdf
Fixed create template from snapshot never returning (#4005) 2020-04-01 17:22:47 +02:00