This PR aims at restoring the previous level of leniency in listing templates on stores that have been marked as deleted / removed by updating the DB.
While Cloudstack doesn't allow deleting stores that have resources on them, it may so happen that users may mimic a deletion of a store by merely updating the DB. Under such a case, listing of templates is hampered due to an NPE that is caused. (as seen in #4606)
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ
Documentation PR: apache/cloudstack-documentation#169
This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack
Other improvements addressed in addition to PowerFlex/ScaleIO support:
- Added support for config drives in host cache for KVM
=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
- Updated full deployment destination for preparing the network(s) on VM start
- Propagate the direct download certificates uploaded to the newly added KVM hosts
- Discover the template size for direct download templates using any available host from the zones specified on template registration
=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
- Retry VM deployment/start when the host cannot grant access to volume/template
- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
- Check the router filesystem is writable or not, before performing health checks
=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)
* Addressed some issues for few operations on PowerFlex storage pool.
- Updated migration volume operation to sync the status and wait for migration to complete.
- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.
- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).
- Updated resize volume error message string.
- Blocked the below operations on PowerFlex storage pool:
-> Extract Volume
-> Create Snapshot for VMSnapshot
* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.
- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html
Other fixes included:
- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)
- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.
- Perform basic tests (for connectivity and file system) on router before updating the health check config data
=> Validate the basic tests (connectivity and file system check) on router
=> Cleanup the health check results when router is destroyed
* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0
* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool
and Minor improvements in PowerFlex/ScaleIO storage plugin code
* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.
- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.
* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py
* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)
* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.
* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure
* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
* Prevent KVM from performing volume migrations of running instances
KVM has a limitation to modify instances definitions while they are on running state. Therefore, it is not possible to change volumes backend location easily.
There is a problem in the `migrateVolume` API. This API command ignores that limitation and causes an inconsistence on the database. ACS processes the migrate command, copies the volume to the destination storage, modifies the database and finishes the process with success. However, the running backend is still using the "old volume file".
This PR intends to prevent KVM to perform volumes migrations while KVM instances are in the running state and inform the user of an alternative API command that enables such operation on running instances.
* Update VolumeApiServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
Co-authored-by: Rohit Yadav <rohit@apache.org>
* Show network name in exception message
* Update server/src/main/java/com/cloud/vm/UserVmManagerImpl.java
Co-authored-by: dahn <daan.hoogland@gmail.com>
When domain is deleted, all the settings configured under
the domain scope still exists in domain_details table.
All the entries for the domain should be deleted as well
- Fixes inter-cluster migration of VMs
- Allows migration of stopped VM with disks attached to different and suitable pools
- Improves inter-cluster detached volume migration
- Allows inter-cluster migration (clusters of same Pod) for system VMs, VRs on VMware
- Allows storage migration for stopped system VMs, VRs on VMware within same Pod if StoragePool cluster scopetype
Linked Primate PR: https://github.com/apache/cloudstack-primate/pull/789 [Changes merged in this PR after new UI merge]
Documentation PR: https://github.com/apache/cloudstack-documentation/pull/170
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Steps to reproduce the issue:
(1)Create 10000 service offerings (by db changes below or cloudmonkey).
```
DROP PROCEDURE IF EXISTS cloud.insert_service_offering;
DELIMITER $$
CREATE PROCEDURE cloud.insert_service_offering()
BEGIN
DECLARE count INT DEFAULT 10000;
SET @offeringid = (select max(id)+1 from disk_offering);
WHILE count > 0 DO
INSERT INTO disk_offering (id,name,uuid,display_text,disk_size,type,created) values (@offeringid,'test-offering-wei',uuid(), 'test-offering-wei',0,'Service',now());
INSERT INTO service_offering (id,cpu,speed,ram_size) values (@offeringid, 1, 500,256);
SET @offeringid = @offeringid + 1;
SET count = count - 1;
END WHILE;
END $$
DELIMITER ;
CALL cloud.insert_service_offering();
mysql> CALL cloud.insert_service_offering();
Query OK, 0 rows affected (2 min 30.85 sec)
```
(2) Check the total time of periodical capacity check in cloudstack.
Without this patch, it spend 2.5 seconds (2 hosts)
```
2021-01-15 16:10:12,793 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Running Capacity Checker ...
2021-01-15 16:10:15,287 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Done running Capacity Checker ...
```
With this patch ,it spend 1.3 seconds (2 hosts)
```
2021-01-15 16:12:43,604 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Running Capacity Checker ...
2021-01-15 16:12:44,927 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Done running Capacity Checker ...
```
If there are 100 hosts, the total time will be reduced from 100+ seconds to around 10 seconds.
* 4.15:
server: select root disk based on user input during vm import (#4591)
kvm: Use Q35 chipset for UEFI x86_64 (#4576)
server: fix wrong error message when create isolated network without SourceNat (#4624)
server: add possibility to scale vm to current customer offerings (#4622)
server: keep networks order and ips while move a vm with multiple networks (#4602)
server: throw exception when update vm nic on L2 network (#4625)
doc: fix typo in install notes (#4633)
* 4.14:
server: select root disk based on user input during vm import (#4591)
kvm: Use Q35 chipset for UEFI x86_64 (#4576)
server: fix wrong error message when create isolated network without SourceNat (#4624)
server: add possibility to scale vm to current customer offerings (#4622)
server: keep networks order and ips while move a vm with multiple networks (#4602)
server: throw exception when update vm nic on L2 network (#4625)
doc: fix typo in install notes (#4633)
We can use cloudmonkey to scale a vm with dynamic offering, to same offering but with different cpunumber or memory.
Enable it on UI to improve user experience.
This PR fixes an issue when move a vm from an account to another account.
Steps to reproduce the issue
(1) create a vm with multiple shared networks (in advanced zone, or advanced zone with security groups)
(2) create another account (in same domain who can also access the shared networks)
(3) move vm to new account, with a list of networkid
expected result: the vm has nics on the networks in same order as specified in API request, and nics have the same ips as before actual result: network order is not same as specified, ips are changed.
* server: fix cannot create vm if another vm with same name has been added and removed on the network
steps to reproduce the issue
(1) create vm-1 on network-1
(2) add vm-1 to network-2
(3) remove vm-1 from network-2
(4) create another vm with same name vm-1 on network-2
expected result: operation succeed
actual result: operation failed.
* #4600: add back a removed line
Update the guest OS from the OVF file after upload is completed
This PR fixes the template upload from local on VMware
Co-authored-by: dahn <daan.hoogland@gmail.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
This PR addresses an error that appears when you try to add a new host. I don't even understand why there was a cast to String in the first place. I will assume some classes send HypervisorType and some send a string (empty or otherwise). Shouldn't this be addressed to use the same type everywhere? With this fix adding a new xenserver host works fine.
Co-authored-by: dahn <daan.hoogland@gmail.com>
* Add vpcid in usage network response
Currently vpcid is displayed in listUsageNetworks response.
Add the vpcid so that we can see to which vpc, the network belongs
* use new function to get removed
* Display VPC name to which the network belongs to
If an isolated network is created in VPC then display
its name along with vpc id which is used for UI
* Change description
* vpc: fix ips on wrong interfaces after rebooting vpc vrs
* #4467: Rename to updateNicWithDeviceId
* CLSTACK-8923 vr: Force a restart of keepalived if conntrackd is not running or configuration has changed
This PR removes system reserved IP addresses from the options of acquiring IP addresses. Choosing any reserved IP address results in an error. The IP addresses should not have been displayed in the first place.
Fixes: #4310
When we try to reset the site 2 site vpn connection while
the VR's are being restarted, the connection enters the
PENDING state and we cant reset the connection.
So change the state from PENDING to disconnected.
Steps to reproduce the issue
1.create a VPC with a tier (vpc-001-001 in vpc-001), create a vm
2.create a VPC with a tier (vpc-002-001 in vpc-002) with different cidr, create a vm
3.create custom gateway for both vpn
4.enable site-to-site vpn on both vpn, and add vpn connection to each other. both should be "Connected"
5.restart vpc-001 with cleanup and monitor it
6.when the first router is destroyed, go to site-to-site vpn page and reset vpn connection.
7.we will get an error "Resource [DataCenter:1] is unreachable: Unable to apply site 2 site VPN configuration, virtual router is not in the right state"
and vpn connection is stuck at Pending
8.When vpc is restarted, go to site-to-site vpn page and reset vpn connection.
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
this contains other changes
(1) add isrouting field for vm templates on UI
(2) show register URL of template/iso on UI
(3) set 'Bootable' field to changable for existing ISO
If the resource state of hypervisor in "Maintenance" then it
should be considered as offline even though the agent state
is "Up". Since its in maintenance mode, it cant be used to
allocate VM's and hence can't be considered towards resource
allocation
If vm has last host_id specified, cloudstack will try to start vm on it at first.
However, host tag is checked, but guest os preference is not checked.
for new vm, it will be deployed to the preferred host as we expect.
Fixes: #3554 (comment)
* added defensive checks for avoiding NPE and list projects API fix
* list projects with account name provided to not include users in the account in response
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This feature enables the following:
Balanced migration of data objects from source Image store to destination Image store(s)
Complete migration of data
setting an image store to read-only
viewing download progress of templates across all data stores
Related Primate PR: apache/cloudstack-primate#326
* Display acl name in listNetworks response
Display acl name along with its id so that we
dont need to make extra api call to get acl name
* Add since tag
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):
2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
...
at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.
Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
When executing request assignVirtualMachine with null domainID and a valid projectID then a NullPointerException happens at DomainChecker.java.
Command example:
assign virtualmachine virtualmachineid=vmID projectid=projectID account=admin
The NullPointerException that is thrown at DomainChecker is handled at AssignVMCmd.java#L142, resulting in the following log message: Failed to move vm null.
While remove secondary nic from a Running vm, if update the default nic to the secondary nic before the nic is removed, the vm will not have default nic (and cannot be started) when both operations are completed.
It is because UpdateDefaultNic api is not handled as a vm work job (AddNicToVMCmd and RemoveNicFromVMCmd are), it is processed before nic is removed. The result is that secondary nic becomes default nic and got removed.
This PR aims to fix the issue below
Create a network offering for isolated network, services: Dns/Dhcp/Userdata, and enable it
create a isolated network with the new offering
create a vm
check the guest IP of virtual router,
restart network with cleanup
check the guest IP of new virtual router
The IP in step4 and step6 should be the same, but they are different actually.
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
The "hypervisor" field in listvmsnapshot response will
be used in primate to enable/disable creating snapshot
from vm snapshot functionality.
Creating snpashot from vm snapshot will be enabled only if
hypervisor is KVM
This PR adds minor version support when mounting nfs on the SSVM as requested in #2861
The global setting "secstorage.nfs.version" has been changed to use the String data type which allows any minor version to be specified.
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes