* Managed volume copy was always returning first host that could see storage pools
* Fix NPE in logging for ScaleIOPrimaryDataStoreDriver
Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
* Report the PowerFlex/ScaleIO disk copy failure during volume migration and fail the migration.
* Add the source format to conversion cmd 'qemu-img convert' when specified explicitly.
* Ignore the file encryption for now, to specify source format
When a host connects to a management server, the host IP address and the certificate are stored in memory on the management server. This mapping is checked periodically to determine if any certificates are due to expire.
Before a certificate is renewed, a few checks are done to determine if the host is connected to the management server by fetching the host record from the database. The problem here is if the wrong record is fetched, the host is not checked for renewal.
This PR improves the host record fetch from the database by looking only at hosts that are not removed.
Fixes: #4129
(cherry picked from commit 7b881517b7)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Spaceman1984 <49917670+Spaceman1984@users.noreply.github.com>
This PR updates the PowerFlex/ScaleIO gateway client with the following improvements.
- Added connection manager to the gateway client.
- Renew the client session on '401 Unauthorized' response.
- Refactored the gateway client calls, for GET and POST methods.
- Consume the http entity content after login/(re)authentication and close the content stream if exists.
- Added storage pool client max connections configuration 'storage.pool.client.max.connections' (default: 100) to specify the maximum connections for the ScaleIO storage pool client.
- Updated storage pool client connection timeout configuration 'storage.pool.client.timeout' to non-dynamic (It's value is picked by the ScaleIO client when it is created, which is either on the primary storage addition or management server (re)start for existing pools. In order to apply this config value for existing pools, it is required to restart the management server. Changing this config to 'static' prompts the operator, to restart the management server to apply the new value).
* Updated libvirt's native reboot operation for VM on KVM using ACPI event, and Added 'forced' reboot option to stop and start the VMs
- Added 'forced' reboot option for User VM (New parameter 'forced' in rebootVirtualMachine API, to stop and start User VM)
- Added 'forced' reboot option for System VM (New parameter 'forced' in rebootSystemVm API, to stop and then start System VM)
- Added 'forced' reboot option for Router (New parameter 'forced' in rebootRouter API, to force stop and then start Router)
- Added force reboot tests for User VM, System VM and Router
* Updated the PowerFlex/ScaleIO volume operations support in CloudStack. Added support for the folllowing:
- PowerFlex volume migration (with snapshots) within the same PowerFlex storage clusters, using native V-Tree migration.
- PowerFlex volume migration (without snapshots) across different PowerFlex storage clusters.
=> findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
=> Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
=> Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
=> Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.
- Template creation (on secondary storage) from PowerFelx/ScaleIO volume or snapshot.
- Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)
Other Changes:
- Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
- Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.
* Provision to add PowerFlex/ScaleIO storage pool as Primary Storage from UI
* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
* Addressed some issues for the operations on PowerFlex storage pool.
- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.
- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).
- Updated resize volume error message string.
- Blocked the below operations on PowerFlex storage pool:
-> Extract Volume
-> Create Snapshot for VMSnapshot
* Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)
* Use single gateway client per Powerflex/ScaleIO pool and renew it when the session token expires.
The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html
* Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.
* Perform basic tests (for connectivity and file system) on router before updating the health check config data
Updated changes to:
- Validate the basic tests (connectivity and file system check) on router
- Cleanup the health check results when router is destroyed
* scaleio: prototype storage plugin
- plugin skeleton
- add storage pool, create/attach data disk
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* kvm: attach disk example
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Updated ScaleIO storage plugin to support Volume operations
* ScaleIO storage plugin - Support for VM operations and other updates
* ScaleIO storage pool plugin changes
- Added validation to check existing ScaleIO storage pool and update capacity details
- Updated resize volume for ScaleIO to pick the rounded 8GB boundary size
- Added support for setting ScaleIO storage pool statistics (bandwidthLimitInKbps, iopsLimit)
* Fixed IOPS validation and volume size update when resizing ScaleIO volume
* Removed connect/disconnect disk changes from ScaleIO storage adaptor
- ScaleIO datastore driver does map/unmap ScaleIO volume (from MS) using grant/revoke access
- Not required to map/unmap ScaleIO volume from the storage adaptor
* Updated connect disk, to wait for ScaleIO volume to become available in the KVM host
* Updated ScaleIO storage provider, pool type, url scheme and related paramters to the new "PowerFlex" brand
* Fixed size rounding issue while creating PowerFlex volume and added validations to PowerFlex Gateway API client
* Updated host sdc connection check for ScaleIO/PowerFlex pool on host connect
* Updated volume snapshots support for volumes on ScaleIO/PowerFlex storage pool and Added some validations for ScaleIO disks in host
* Added primary storage level configurable setting "storage.pool.disk.wait" to wait for disk availability
- Confiure the disk availability wait time, mainly introduced for ScaleIO/PowerFlex storage pool (can be used for other managed storages), to wait for the disk to become available in the host before performing any operation on it
* Enabled template spooling to ScaleIO/PowerFlex storage pool and create VM from the spooled template.
Added ScaleIO SDC limits support for volumes using offering parameters: bandwidthLimitInKbps, iopsLimit.
* Added support for VM snapshots on ScaleIO/PowerFlex storage pool
Minor improvements for IOPS (SDC Limits) configuration
* Updated access for ScaleIO/PowerFlex volumes on VM Start and Stop
Added primary storage level configurable setting "storage.pool.client.timeout" for storage API client
Enabled cluster wide storage pool support for ScaleIO/PowerFlex storage
Minor improvements for ScaleIO/PowerFlex disk access in the KVM host
* Added support for direct download of templates (raw, qcow2) on ScaleIO/PowerFlex storage pool
* Added support for config drives in host cache for KVM
- Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
- Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
- Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
- Added new parameter "vm.configdrive.host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path for config drives
* Updated disk access while migrating the VM with volumes on ScaleIO/PowerFlex storage pool
Changed the parameter "vm.configdrive.host.cache.location" to "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties to specify the host cache path
Changes to create config drives on the "/config" directory on the host cache path
Changes to suppport migrate VM with config drive on the host cache path
* Additonal changes to support migrate VM with config drive on the host cache
* Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
Updated full deployment destination for preparing the network(s) on VM start
* Propagate the direct download certificates uploaded to the newly added KVM hosts
* Code improvements for ScaleIO/PowerFlex storage plugin
* Updated storage stats collection and tests for ScaleIO/PowerFlex storage plugin
* Fix for template size of direct download templates on capacity check for ScaleIO/PowerFlex storage pool
Updated data object grant and revoke access for connected SDCs to ScaleIO/PowerFlex storage pool
* Discover the template size for direct download templates using any available host from the zones specified on template registration
When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
* Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
* Ensure the volume to be expunged, is expunge ready on storage cleanup
* Do not set the storage migration flag for the volumes on zone wide PowerFlex/ScaleIO pool when listing the hosts available for cross-cluster migration
* Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
* Added alerts for PowerFlex/ScaleIO SDC disconnection on the host(s)
* Retry VM deployment/start when the host cannot access volume/template on the ScaleIO/PowerFlex storage
* Changes to find a potential host that can access the ScaleIO/PowerFlex storage pool
* Updated ScaleIO/PowerFlex storage pool stats for checking the available capacity and usage
* Updated ScaleIO/PowerFlex volumes naming convention to avoid the naming conflicts on sharing
* Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
- Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
* Updated ScaleIO/PowerFlex storage pool capacity stats
* Cleanup unused templates and host entries on PowerFlex/ScaleIO storage pool deletion
* Check the router filesystem is writable or not, before performing health checks
- Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
- The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
* Updated the router filesystem writable check using script, instead cmd execution
- Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
* Update volume stats (physical and virtual size) for the volumes on PowerFlex/ScaleIO storage pool
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
BackupSync task would switch between databases to update backup usage
metrics in the cloud_usage.usage_backup table. The current framework
and the usage in ManagedContext causes database connection
(LegacyTransaction) leaks. When the thread runs faster, the issue is
easily reproducible and checking via heap dump analysis or using JMX
MBeans. This fixes by moving the task of backup data updation for
usage data to the usage server by publishing usage events instead of
switching between databases in a local thread while in a
ManagedContextRunnable.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit b54d19b3b9)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Fixes#4090
When trying to migrate a VM across 2 clusters, if a snapshot has been deleted and garbage collection has run to update the removed field, it is not possible to migrate the instance due to a null pointer.
(cherry picked from commit 6a683dcf77)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When scripts/vm/hypervisor/kvm/kvmvmactivity.sh is called with an incorrect file name, an error is printed which is then interpreted as output from the script.
When an incorrect file name is passed the script prints out:
stat: cannot stat ‘b51d7336-d964-44ee-be60-bf62783dabc’: No such file or directory
=====> DEAD <======
The KVMHAVMActivityChecker.java checkingHB() process is expecting just
=====> DEAD <======
but gets the unexpected error message and interprets the file as alive.
(cherry picked from commit 23fa647985)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When Guest VM add secondary nic, will get wrong hostname "infiniteh" from dhcp server
infiniteh -->infinite
cat /etc/dhcphosts.txt
02:00:0b:ef:00:04,set:192_168_4_18,192.168.4.18,gumd-tes3,infiniteh
* changed template and binaries iso to public links
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* iso state check and timeout fixes
refactoring
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* changed timeouts
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* db.properties: Enforce UTC timezone by default
This would give users ability to change the timezone
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* fix server time to UTC
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Update the db.usage.url.params=serverTimezone=UTC per Liridon's testing
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Update AncientDataMotionStrategy.java
fix When secondary storage usage is> 90%, VOLUME migration across primary storage will cause the migration to fail and lose VOLUME
* Update AncientDataMotionStrategy.java
Volume is migrated across Primary storage. If no secondary storage is available(Or used capacity> 90% ), the migration is canceled.
Before modification, if secondary storage cannot be found, copyVolumeBetweenPools return NUll
copyAsync considers answer = null to be a sign of successful task execution, so it deletes the VOLUME on the old primary storage. This is the root cause of data loss, because VOLUME did not perform the migration at all.
* code in comment removed
Co-authored-by: div8cn <35140268+div8cn@users.noreply.github.com>
Co-authored-by: Daan Hoogland <dahn@onecht.net>
* 4.13:
Snapshot deletion issues (#3969)
server: Cannot list affinity group if there are hosts dedicated… (#4025)
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
* Fixes snapshot deletion
* Remove legacy '@Component', it is not necessary in this bean/class.
* Fix log message missing %d and remove snapshot on DB
* Remove "dummy" boolean return statement
* Manage snapshot deletion for KVM + NFS (primary storage)
* checkstyle trailing spaces
* rename options strings to *_OPTION
* Fix typo on deleteSnapshotOnSecondaryStorage and enhance log message
* Move the snapshotDao.remove(snapshotId); (#4006)
* Fix deletesnapshot worflow to handle both snapshots created in primary storage and snapshots backed up to secondary storage
* Fix extra space
* refactor out separate handling methods for secondary and primary (reducing returns)
* return false on unexpected error or log when expected
* != instead of ==
* secondary instead of backup storage
* init to null
* Handle snapshot deletion on primary storage. When primary store ref not found for snapshot do not fail the operation.
* Fix debug levels on log messages
Co-authored-by: GabrielBrascher <gabriel@apache.org>
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
Co-authored-by: Harikrishna Patnala <harikrishna.patnala@gmail.com>
Co-authored-by: nvazquez <nicovazquez90@gmail.com>