This PR aims at introducing persistence mode in L2 networks and enhancing the behavior in Isolated networks
Doc PR apache/cloudstack-documentation#183
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This contains 3 main changes
(1) add NETWORK_STATS_ethX for all nics with public ips in VPC VRs (current: NETWORK_STATS_eth1)
(2) DO NOT create records in user_statistics for each VPC tier (only one record per public nic per VPC VR)
(3) send NetworkUsageCommand before unplugging a NIC with public IPs from VPC VR
Public IP addresses dedicated to one domain should not be accessed
by other domains. Also, root admin should be able to display all
public ip addresses in system.
Currently following issues exist
1. Public IP address assigned to one domain can be accessed by
other sibling domains
If use.system.public.ip is false then child domains should not
see public ip of ROOT domain
Before fix
```
(test1) mgt01 > list publicipaddresses listall=true fordisplay=true allocatedonly=false forvirtualnetwork=true filter=ipaddress,
{
"count": 59,
"publicipaddress": [
```
After fix
```
(test) mgt01 > list publicipaddresses listall=true fordisplay=true allocatedonly=false forvirtualnetwork=true filter=ipaddress,
{
"count": 10,
```
* server: create DB entry for storage pool capacity when create storage pool
* Revert "server: create DB entry for storage pool capacity when create storage pool"
This reverts commit e790167bfe.
* server: create DB entry for storage pool capacity when create zone-wide storage pools
Update new systemvmtemplate for 4.15.1.0; synced:
http://download.cloudstack.org/systemvm/4.15/
A new template is necessary due to many security fixes over the last year, the 4.15.0 systemvmtemplate was created about a year ago.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
* Update vm_template table removed field when template is deleted
* Update method name
* address comment
* Extracted code to separate methods
* Address test failure
* refactor test cleanup
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
* Updated libvirt's native reboot operation for VM on KVM using ACPI event, and Added 'forced' reboot option to stop and start the VM (using rebootVirtualMachine API)
* Added 'forced' reboot option for System VM and Router
- New parameter 'forced' in rebootSystemVm API, to stop and then start System VM
- New parameter 'forced' in rebootRouter API, to force stop and then start Router
* Added force reboot tests for User VM, System VM and Router
Duplicated volumes after failed migration in Allocated state
Fix: Clean up the duplicate volume when the destination managed volume creation failed on migrate volume operation
While finding pools for volume migration list following compatible storages:
- all zone-wide storages of the same hypervisor.
- when the volume is attached to a VM, then all storages from the same cluster as that of VM.
- for detached volume, all storages that belong to clusters of the same hypervisor.
Fixes#4692Fixes#4400
If the template from which VR is created got deleted, the state
is set to inactive and removed to null.
Since the template is already deleted, the VR can't be created
using this template again.
If someone restarts network with cleanup then it will try to
deploy the vr from the old non existing template again.
So search only for active template which are not yet deleted.
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ
Documentation PR: apache/cloudstack-documentation#169
This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack
Other improvements addressed in addition to PowerFlex/ScaleIO support:
- Added support for config drives in host cache for KVM
=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
- Updated full deployment destination for preparing the network(s) on VM start
- Propagate the direct download certificates uploaded to the newly added KVM hosts
- Discover the template size for direct download templates using any available host from the zones specified on template registration
=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
- Retry VM deployment/start when the host cannot grant access to volume/template
- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
- Check the router filesystem is writable or not, before performing health checks
=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)
* Addressed some issues for few operations on PowerFlex storage pool.
- Updated migration volume operation to sync the status and wait for migration to complete.
- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.
- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).
- Updated resize volume error message string.
- Blocked the below operations on PowerFlex storage pool:
-> Extract Volume
-> Create Snapshot for VMSnapshot
* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.
- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html
Other fixes included:
- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)
- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.
- Perform basic tests (for connectivity and file system) on router before updating the health check config data
=> Validate the basic tests (connectivity and file system check) on router
=> Cleanup the health check results when router is destroyed
* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0
* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool
and Minor improvements in PowerFlex/ScaleIO storage plugin code
* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.
- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.
* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py
* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)
* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.
* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure
* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
- Fixes inter-cluster migration of VMs
- Allows migration of stopped VM with disks attached to different and suitable pools
- Improves inter-cluster detached volume migration
- Allows inter-cluster migration (clusters of same Pod) for system VMs, VRs on VMware
- Allows storage migration for stopped system VMs, VRs on VMware within same Pod if StoragePool cluster scopetype
Linked Primate PR: https://github.com/apache/cloudstack-primate/pull/789 [Changes merged in this PR after new UI merge]
Documentation PR: https://github.com/apache/cloudstack-documentation/pull/170
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Steps to reproduce the issue:
(1)Create 10000 service offerings (by db changes below or cloudmonkey).
```
DROP PROCEDURE IF EXISTS cloud.insert_service_offering;
DELIMITER $$
CREATE PROCEDURE cloud.insert_service_offering()
BEGIN
DECLARE count INT DEFAULT 10000;
SET @offeringid = (select max(id)+1 from disk_offering);
WHILE count > 0 DO
INSERT INTO disk_offering (id,name,uuid,display_text,disk_size,type,created) values (@offeringid,'test-offering-wei',uuid(), 'test-offering-wei',0,'Service',now());
INSERT INTO service_offering (id,cpu,speed,ram_size) values (@offeringid, 1, 500,256);
SET @offeringid = @offeringid + 1;
SET count = count - 1;
END WHILE;
END $$
DELIMITER ;
CALL cloud.insert_service_offering();
mysql> CALL cloud.insert_service_offering();
Query OK, 0 rows affected (2 min 30.85 sec)
```
(2) Check the total time of periodical capacity check in cloudstack.
Without this patch, it spend 2.5 seconds (2 hosts)
```
2021-01-15 16:10:12,793 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Running Capacity Checker ...
2021-01-15 16:10:15,287 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Done running Capacity Checker ...
```
With this patch ,it spend 1.3 seconds (2 hosts)
```
2021-01-15 16:12:43,604 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Running Capacity Checker ...
2021-01-15 16:12:44,927 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Done running Capacity Checker ...
```
If there are 100 hosts, the total time will be reduced from 100+ seconds to around 10 seconds.
* 4.15:
server: select root disk based on user input during vm import (#4591)
kvm: Use Q35 chipset for UEFI x86_64 (#4576)
server: fix wrong error message when create isolated network without SourceNat (#4624)
server: add possibility to scale vm to current customer offerings (#4622)
server: keep networks order and ips while move a vm with multiple networks (#4602)
server: throw exception when update vm nic on L2 network (#4625)
doc: fix typo in install notes (#4633)
* 4.14:
server: select root disk based on user input during vm import (#4591)
kvm: Use Q35 chipset for UEFI x86_64 (#4576)
server: fix wrong error message when create isolated network without SourceNat (#4624)
server: add possibility to scale vm to current customer offerings (#4622)
server: keep networks order and ips while move a vm with multiple networks (#4602)
server: throw exception when update vm nic on L2 network (#4625)
doc: fix typo in install notes (#4633)
This PR fixes an issue when move a vm from an account to another account.
Steps to reproduce the issue
(1) create a vm with multiple shared networks (in advanced zone, or advanced zone with security groups)
(2) create another account (in same domain who can also access the shared networks)
(3) move vm to new account, with a list of networkid
expected result: the vm has nics on the networks in same order as specified in API request, and nics have the same ips as before actual result: network order is not same as specified, ips are changed.
* server: fix cannot create vm if another vm with same name has been added and removed on the network
steps to reproduce the issue
(1) create vm-1 on network-1
(2) add vm-1 to network-2
(3) remove vm-1 from network-2
(4) create another vm with same name vm-1 on network-2
expected result: operation succeed
actual result: operation failed.
* #4600: add back a removed line
Fix db upgrade path conflict, add 4.15.1.0->4.16.0.0 for master, bump
systemvmtemplate version to 4.16.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Fix for mapping guest OS type read from OVF to existing guest OS in CloudStack database while registering VMware template
* Added unit tests to String Utils methods and updated the code
* Updated the java doc section
* Updated os description logic to keep equals ignore match with guest os display name
Update the guest OS from the OVF file after upload is completed
This PR fixes the template upload from local on VMware
Co-authored-by: dahn <daan.hoogland@gmail.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
This PR addresses an error that appears when you try to add a new host. I don't even understand why there was a cast to String in the first place. I will assume some classes send HypervisorType and some send a string (empty or otherwise). Shouldn't this be addressed to use the same type everywhere? With this fix adding a new xenserver host works fine.
Co-authored-by: dahn <daan.hoogland@gmail.com>
* Setting snapshot state to error on timeout
* Setting removed field so snapshot record is ignored by garbage collection
* Removed explicitly setting error status, renamed method from markFailed to markRemoved
* Renamed method, moved code a few lines down
* Moved remove logic
* Removed unused service
* Moved removed logic - last time, promise
For Basic network isolation methods are not provided, and exception is
thrown when trying to encode the Vlan id. That's why we have to check
before encoding that the list with isolation methods is not empty
* support for handling incremental snaps (on DB entries) on xen
* Addressed comments
* Update NfsSecondaryStorageResource.java
adjusted space in comment/ log
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This feature enables the following:
Balanced migration of data objects from source Image store to destination Image store(s)
Complete migration of data
setting an image store to read-only
viewing download progress of templates across all data stores
Related Primate PR: apache/cloudstack-primate#326
After a vm is shutdown, the power state isn't updated immediately. This prevents changing the service offering.
This PR updates the power state immediately after the vm is confirmed to be shutdown.
Fixes: #3159
While remove secondary nic from a Running vm, if update the default nic to the secondary nic before the nic is removed, the vm will not have default nic (and cannot be started) when both operations are completed.
It is because UpdateDefaultNic api is not handled as a vm work job (AddNicToVMCmd and RemoveNicFromVMCmd are), it is processed before nic is removed. The result is that secondary nic becomes default nic and got removed.
* engine: honour bypass VLAN id/range for L2 networks
Commit e894238d904a9c49c1140371f612a51d251efc1 (#3899) allowed private
gateways to bypass vlan check while refactoring it did not cover the
case for L2 but only shared network. This fix will re-enable honouring
the bypass vlan check option for L2 guest network (in addition to the
Shared networks).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Update NetworkOrchestrator.java
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
This PR adds minor version support when mounting nfs on the SSVM as requested in #2861
The global setting "secstorage.nfs.version" has been changed to use the String data type which allows any minor version to be specified.
* DB : Add support for MySQL 8
- Splits commands to create user and grant access on database, the old
statement is no longer supported by MySQL 8.x
- `NO_AUTO_CREATE_USER` is no longer supported by MySQL 8.x so remove
that from db.properties conn parameters
For mysql-server 8.x setup the following changes were added/tested to
make it work with CloudStack in /etc/mysql/mysql.conf.d/mysqld.cnf and
then restart the mysql-server process:
server_id = 1
sql-mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_ENGINE_SUBSTITUTION"
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=1000
log-bin=mysql-bin
binlog-format = 'ROW'
default-authentication-plugin=mysql_native_password
Notice the last line above, this is to reset the old password based
authentication used by MySQL 5.x.
Developers can set empty password as follows:
> sudo mysql -u root
ALTER USER 'root'@'localhost' IDENTIFIED BY '';
In libvirt repository, there are two related commits
2019-08-23 13:13 Daniel P. Berrangé ● rpm: don't enable socket activation in upgrade if --listen present
2019-08-22 14:52 Daniel P. Berrangé ● remote: forbid the --listen arg when systemd socket activation
In libvirt.spec.in
/bin/systemctl mask libvirtd.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-ro.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-admin.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-tls.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-tcp.socket >/dev/null 2>&1 || :
Co-authored-by: Wei Zhou <w.zhou@global.leaseweb.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
When the static route service is not available on the VPC and a static route is created, the static route is created in a revoked state.
Currently, the UI doesn't distinguish between active or revoked static routes.
This PR adds the missing state filter to the list routes command and only lists active routes in the UI.
It also ignores revoked routes when the private gateway is being removed but clears out the inactive routes before the gateway is removed.
Fixes#2908
This fixes NPE caused due to merge conflict fix from the forward merge
commit 562a7db8df and fixes travis test
regression:
=== TestName: test_01_reset_vm_on_reboot | Status : SUCCESS ===
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Fixes#4056Fixes#4107Fixes#4113Fixes#4133
Fixes deployment, template and network deletion.
Also allows filetering in listKubernetesSupportedVersions with keyword
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
This PR adds support for the OOBM Redfish protocol, implementing a Java client to send HTTP requests to Redfish supported systems.
Implementation overview:
- Redfish Java client: a Java Client for Redfish that makes Redfish actions available to the HA workflow via an OOB driver.
- OOB Redfish driver: a new Out-of-band driver was created for Redfish, allowing to integrate the Redfish Client with the CloudStack Out-of-band management implementation.
Fixes: #3624
Adding the following fixes so primate can work without issues :
- Adding pagination for listNetworkAclLists
- Adding pagination for listRoles
- Returning mshost uuid rather than msid in list hosts response
- Allowing listVirtualMachinesMetrics to respect hostid
- Fixing return all details in template response
When a host connects to a management server, the host IP address and the certificate are stored in memory on the management server. This mapping is checked periodically to determine if any certificates are due to expire.
Before a certificate is renewed, a few checks are done to determine if the host is connected to the management server by fetching the host record from the database. The problem here is if the wrong record is fetched, the host is not checked for renewal.
This PR improves the host record fetch from the database by looking only at hosts that are not removed.
Fixes: #4129
- Create a role from any of the existing role, using new parameter roleid in createRole API
- Import a role with its rules, using a new importRole API
- New default roles for Read-Only and Support Admin & User
- No modifications allowed for Default roles
- Cleaned up old NetApp APIs from role_permissions table.
While migrate a vm, in the popup, the host dedicated to other accounts/domains are also 'Suitable" for migration, which is obviously wrong.
The same issue happens with api findHostsForMigration
* Enable unmanaging guest VMs
* Minor fixes
* Fix stop usage event only if VM is not stopped when unmanaging
* Rename unmanaged VMs manager
* Generate netofferingremove usage event if VM is not stopped
* Generate usage event VM snapshot primary off when unmanaging
This PR fixes an issue where an instance fails to deploy due to a null pointer when using an L2 Guest Network with DefaultL2NetworkOfferingConfigDrive on Xenserver. It also fixes migrating an instance to another host.
This has been tested by:
- Creating an L2 Guest network, using DefaultL2NetworkOfferingConfigDrive as the network offering.
- Deploying an instance using the L2 Guest network created.
- Migrating the instance away from the host and back
Repro Steps:
1. Create a VM on host1
2. Make host1 capacity full by deploying multiple VMs
3. Try Dynamic scaling on VM on host1
4. NPE occurs when MS tries to find host to migrate the VM and then scale.
Root cause: VM profile is not initiated properly with serviceoffering before planning for deployment
Solution: Iniate VM profile with serviceoffering and also make sure custom compute parameters are handled
Currently CloudStack is using logging frameworks as log4j and Java util logging, logging wrappers as slf4j and Apache common logging.
Here changes are to made it uniform, using only log4j framework.
Removed Java util logging, slf4j and Apache common logging.
BackupSync task would switch between databases to update backup usage
metrics in the cloud_usage.usage_backup table. The current framework
and the usage in ManagedContext causes database connection
(LegacyTransaction) leaks. When the thread runs faster, the issue is
easily reproducible and checking via heap dump analysis or using JMX
MBeans. This fixes by moving the task of backup data updation for
usage data to the usage server by publishing usage events instead of
switching between databases in a local thread while in a
ManagedContextRunnable.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Root cause:
Even though dynamic scaling job is handled in vmworkjob queue which ensures serilizing multiple jobs but the database updating and generating usage events are out of the job queue.
Solution:
Moved all updations into the job queue
Firstly I have tested all the scenarios to check if nothing is broken:
Scaling on a running VM with normal compute offering
Scaling on a stopped VM with normal compute offering
Scaling on a running VM with custom compute offering
Scaling on stopped VM with custom compute offering
Scaling on stopped/running VM between custom compute offering and normal compute offering and combinations among these. Checked if the custom parameters have been populated or deleted accordingly based on the offering to which the VM is scaled
Since this is a corner scenario I could not test the exact point where two usage events are recorded at the same time for two different API calls on same VM.
Fixes#4090
When trying to migrate a VM across 2 clusters, if a snapshot has been deleted and garbage collection has run to update the removed field, it is not possible to migrate the instance due to a null pointer.
When expunge a Running vm, vm will be stopped with forcestop=false which does not make sense. we should honor vm.destroy.forcestop in global setting, or always set forcestop=true.
This upgrades the systemvmtemplate base to Debian 10 with openjdk-11 and a newer strongswan package.
Fixes#3654
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Update AncientDataMotionStrategy.java
fix When secondary storage usage is> 90%, VOLUME migration across primary storage will cause the migration to fail and lose VOLUME
* Update AncientDataMotionStrategy.java
Volume is migrated across Primary storage. If no secondary storage is available(Or used capacity> 90% ), the migration is canceled.
Before modification, if secondary storage cannot be found, copyVolumeBetweenPools return NUll
copyAsync considers answer = null to be a sign of successful task execution, so it deletes the VOLUME on the old primary storage. This is the root cause of data loss, because VOLUME did not perform the migration at all.
* code in comment removed
Co-authored-by: div8cn <35140268+div8cn@users.noreply.github.com>
Co-authored-by: Daan Hoogland <dahn@onecht.net>
* 4.13:
Snapshot deletion issues (#3969)
server: Cannot list affinity group if there are hosts dedicated… (#4025)
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
* Fixes snapshot deletion
* Remove legacy '@Component', it is not necessary in this bean/class.
* Fix log message missing %d and remove snapshot on DB
* Remove "dummy" boolean return statement
* Manage snapshot deletion for KVM + NFS (primary storage)
* checkstyle trailing spaces
* rename options strings to *_OPTION
* Fix typo on deleteSnapshotOnSecondaryStorage and enhance log message
* Move the snapshotDao.remove(snapshotId); (#4006)
* Fix deletesnapshot worflow to handle both snapshots created in primary storage and snapshots backed up to secondary storage
* Fix extra space
* refactor out separate handling methods for secondary and primary (reducing returns)
* return false on unexpected error or log when expected
* != instead of ==
* secondary instead of backup storage
* init to null
* Handle snapshot deletion on primary storage. When primary store ref not found for snapshot do not fail the operation.
* Fix debug levels on log messages
Co-authored-by: GabrielBrascher <gabriel@apache.org>
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
Co-authored-by: Harikrishna Patnala <harikrishna.patnala@gmail.com>
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
As previously described by PR #3929:
If vm has attached ISO, the migration fails with error message "org.libvirt.LibvirtException: Cannot access storage file /mnt/b33e5a1d-e4ea-3465-b6ac-c98dc8ff8af0/207-2-cc5fd717-2d57-3bb3-bcf6-2c930268db6c.iso"
By default, once we create a security group we cant change its name.
In this feature, we introduce a new API command "updateSecurityGroup"
which allows us to rename the security group name. Although we can't
change the name of the "default" security group.
This adds support for JDK11 in CloudStack 4.14+:
- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Enable PVLAN support on L2 networks
* Fix prevent null pointer on details
* Add marvin tests
* Fixes from comments
* Fix: missing pvlan type on plugniccommand
* Fix checks on network creation for vlans overlap
* Fix remove prefix from secondary vlan id
* Improve checks on physical network for pvlans
* Fix compatibility with previous pvlan creation
* Fix shared networks backwards pvlan compatibility
* Add ui fix for pvlan type not passed to api
* Add check for isolated vlan id overlap
* Include check for dynamic vlan reserved for secondary vlan
* Fix marvin tests errors
* Fix redundant imports
* Skip marvin test for pvlan if dvswitch is not present
* spelling
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
* server: fix resource count of primary storage if some volumes are Expunged but not removed
Steps to reproduce the issue
(1) create a vm and stop it. check resource count of primary storage
(2) download volume. resource count of primary storage is not changed.
(3) expunge the vm, the volume will be Expunged state as there is a volume snapshot on secondary storage. The resource count of primary storage decreased.
(4) update resource count of the account (or domain), the resource count of primary storage is reset to the value in step (2).
* New feature: Add support to destroy/recover volumes
* Add integration test for volume destroy/recover
* marvin: check resource count of more types
* messages translate to JP
* Update messages for CN
* translate message for NL
* fix two issues per Daan's comments
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
The VM ingestion feature allows CloudStack to discover, on-board, import existing VMs in an infra. The feature currently works only for VMware, with a hypervisor agnostic framework which may be extended for KVM and XenServer in future.
Add a global setting to disable creating networks with same name in an account
Add a global setting to disable creating network without
mentioning the start and end IPv4 or IPv6 address
By default we can create networks with the same name in the account.
Sometimes we should not create the networks with same name.
This change adds a global setting which prevents creating the network with same name.
The default value is true and set it to false to prevent creating network with same names.
Also its possible to create a shared network without mentioning the
start and the end IPv4 or IPv6 address.
This change adds a global setting which prevents creating a shared
network without specifying the start and the end IPv4 or IPv6 address
If the disk size of the vm to be created is greater
than the volume size, then the exception message should
display the numeric value instead of variable name
Fixes#3191
When a template is registered, code stores md5sum of the downloaded file in the vm_template table. However, this downloaded file could be deleted after template installation if it is not an actual (.qcow2, .ova, etc.) file. When the user copies a template using copyTemplate API, the actual template file will be copied across the image stores. Matching checksum for the copied templated file and the stored value from the vm_template table will result in a mismatch.
Changes will set an empty checksum value for the copied template while passing to download service which allows skipping wrong checksum check for the copied while install.
However, this results in a change in checksum value for concerned template entry in vm_template table post template install.
Co-authored-by: dahn <daan.hoogland@gmail.com>
* marvin: check resource count of more types
* New feature: add flag resource.count.running.vms.only to count resource consumption of only running vms
Stopped VMs do not use CPU/RAM actually.
A new global configuration resource.count.running.vms.only is added to determine whether resource (cpu/memory) of only running vms (including Starting/Stopping) will be taken into calculation of resource consumption.
* Add integration test for resource count of only running vms
Stop asking user (in the upgrade documentation) to remove a trailing slash for local KVM pool - do it here in upgrade path - so not needed in DOC for the upgrade to 4.14 and onwards.
When start a vm or migrate a vm (away from a host in host maintenance), cloudstack will check capacity of all hosts and choose one. If there are hundreds of hosts on the platform, it will take some seconds. When cloudstack choose a host and start/migrate vm to it, the resource consumption of the host might have been changed. This normally happens when we start/migrate multiple vms.
It would be better to double check the host capacity when start vm on a host.
This PR includes the fix for cpucore capacity when start/migrate a vm.
When we calculate a resource consumption of a host, we need to take the vms in following states into calculation: Running, Starting, Stopping, Migrating (to the host), and vms are Migrating from the host. Because, when stop a vm, the resource on host will be released when vm is stopped. When migrate a vm, the resource on destination host will be increased before migration starts, and resource on source host will be decreased after migraiton succeeds.
In cloudstack, there is a task named CapacityChecked which run every 5 minutes (capacity.check.period =300000 ms by default). It recalculates capacity of all hosts. However, it takes only vms in Running and Starting into consideration. We have faced some issues in host maintenance due to it.
Steps to reproduce the issue
(1) migrate N vms from host A to host B, cpu/ram resource increases before the migration.
(2) capacity check recalculate the capacity of hosts. used capacity of Host B will be reset to original value (not including the vms in Migrating).
(3) migrate some more vms from other host to host B, the migrations are allowed by cloudstack (because used capacity is incorrect). If the actual used memory exceed the physical memory on the host, there might be some critical issues (for example, libvirt dies)
* Suqash commits to a single commit and rebase against master
Update marvin tests to use white list
* * Fix marvin test failure
* Add new marvin negative tests cases
* Remove hard-coded hypervisor types in marvin tests
* Fix build error after rebase and add hugepagesless
* Fix readability of python code
* Fix failing test
* Adding cleanup of vms for negative tests
* Bug fixes - change config checks properly and block extraconfig in details
* Trim to compare the keys
* CR comments
* Don't skip extraconfig without exception
Co-authored-by: Boris Stoyanov - a.k.a Bobby <bss.stoyanov@gmail.com>
* 4.13:
only update powerstate if sure it is the latest (#3743)
ui: fix migrate host form no host popup (#3682)
client: jetty session timeout set after server is started (#3658)
Increase DHCP lease time to infinite (#3662)
Currently in cloudstack, when we click on "Acquire New Ip", it will
randomly acquire IP from the pool. With this enhancement, it is
possible to select the IP from the drop down IP list of that network.
Same thing applies for a VPC as well.
* Service layer changes for new way of tracking maintanence progress
* Fixes after offline code review
* Fix marvin tests
* Change state name and add documentation
* Fix test
* Fix and add more unit tests for different caseS
* Fix and enhance Marvin Tests
* Fixes for corner cases
* More fixes and logging
* UI fixes
* Some minor changes and reducing VMs on host for more contained tests
* Fixed ssh client auth problem causing test failure
* Code review changes + fixes + some more logging
* Fix flaky tests by adding delays between host states
* Added fetching only enabled hosts for tests
* Make port blocking KVM specific and refactor to handle failure
* Make failing migrations due to tagged host instead of port blocking
* Added additional check for migrating VMs
* Refactor to use single place for methods checking maintenance states
* Add missing HA config keys
* Change time value to seconds
* Change Integer to Long
* Using ConfigKey defaultValue
* Do some code refactoring
* Simplify code
If the volume is in "Expunged" state then it should not be
considered towards total resource count of "primarystoragetotal"
field.
Currently cloudstack takes into resource calculation even if the
volume is expunged. The volume itself doesnt exist in primage
storage and hence it should not be considered towrds resource
caculation.
Steps to reproduce the issue:
1 . Get the resource count of "primarystoragetotal" of a particular domain.
2 . Create a VM with 5GB root disk size and stop it.
3 . Now the value of "primarystoragetotal" should be intitial value plus 5.
4 . Navigate to "volumes" of the VM and select "Download Volume" option.
5 . Once the volume is downloaded, expunge the VM.
6 . Get the resource count of "primarystoragetotal". it will be same value as in step 3
But it should be same as initial value obtained in step 1.
With this fix, the value obtained at step 6 will be same as in step 1.
In case an older SSVM is removed without changing it's state from Up
to Destroyed/Removed etc, the SSVM may be randomly selected for image
store related operations. This fix ensures that endpoints for an image
store are found only from a set of SSVM hosts that are not removed.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Diagnostics are hard when a snapshot fails if a null pointer occurs. This is because no stack trace or location of the error is logged. I.E.
2019-10-21 12:55:00,056 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Work-Job-Executor-131:ctx-80420156 job-10033827/job-10033828 ctx-4864e2f5) (logid:21454564) copy snasphot failed: java.lang.NullPointerException
* server: Do NOT cleanup dhcp and dns when stop a vm
According comment in PR #3608, dhcp and dns entries are cleaned up only when a VM is expunged.
Revert part of commit 8fb388e931.
* server: cleanup dns/dhcp entries in removeNic instead of finalizeExpunge
- Removes CentOS6/el6 packaging (voting thread reference https://markmail.org/message/u3ka4hwn2lzwiero)
- Add upgrade path from 4.13 to 4.14
- Enable live storage migration support for KVM by default as el6 is deprecated
- PRs using live storage migration
#2997 KVM VM live migration with ROOT volume on file storage type
#2983 KVM live storage migration intra cluster from NFS source and destination
#2298 CLOUDSTACK-9620: Enhancements for managed storage
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When a network IP range is removed, the "vlan" stays mapped on pod_vlan_map; therefore, the method that lists the VLANs by pod id will return null VLANS.
This PR adds proper verifications to avoid null pointer exception when deploying VRs on a pod with removed VLANs. The exception was caused on getPlaceholderNicForRouter.
Problem: In Vmware, appliances that have options that are required to be answered before deployments are configurable through vSphere vCenter user interface but it is not possible from the CloudStack user interface.
Root cause: CloudStack does not handle vApp configuration options during deployments if the appliance contains configurable options. These configurations are mandatory for VM deployment from the appliance on Vmware vSphere vCenter. As shown in the image below, Vmware detects there are mandatory configurations that the administrator must set before deploy the VM from the appliance (in red on the image below):
Solution:
On template registration, after it is downloaded to secondary storage, the OVF file is examined and OVF properties are extracted from the file when available.
OVF properties extracted from templates after being downloaded to secondary storage are stored on the new table 'template_ovf_properties'.
A new optional section is added to the VM deployment wizard in the UI:
If the selected template does not contain OVF properties, then the optional section is not displayed on the wizard.
If the selected template contains OVF properties, then the optional new section is displayed. Each OVF property is displayed and the user must complete every property before proceeding to the next section.
If any configuration property is empty, then a dialog is displayed indicating that there are empty properties which must be set before proceeding
image
The specific OVF properties set on deployment are stored on the 'user_vm_details' table with the prefix: 'ovfproperties-'.
The VM is configured with the vApp configuration section containing the values that the user provided on the wizard.
Fix regression bug that affects KVM local storage migration. Some of the desired execution flows for KVM local storage migration had been altered to allow only managed storage to execute. Fixed allowing managed and non managed storages to execute.
Fixes#3521
Retrieval of an image store using ImageStoreProviderManager has been refactored by introducing three different methods,
DataStore getRandomImageStore(List<DataStore> imageStores);
To get an image store for reading purpose. Threshold capacity check will not be used here.
DataStore getImageStoreWithFreeCapacity(List<DataStore> imageStores);
To get an image store for reading purpose. Threshold capacity check will be used here and the store with max free space will be returned. If no store with filled storage less than the threshold is found, the NULL value will be returned.
List<DataStore> listImageStoresWithFreeCapacity(List<DataStore> imageStores);
To get a list of image stores for writing purpose which fulfills threshold capacity check.
Correspondingly DataStoreManager methods have been refactored to return similar values for a given zone.
Fixes#3287 - NULL value will be returned when secondary storage is needed for writing but there is not store with free space.
Fixes#3041 - Rather than returning random secondary storage for writing, storage with max. free space will be returned.
Fixes#3478 - For migration on VMware, all writable secondary storage will be mounted while preparation.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Currently when refreshing disk usage stats all kvm agents are asked to collect stats for all volumes. In setups with multiple kvm hosts where managed storage is used, not all volumes are attached to all kvm hosts, this results in a large number of warnings in the kvm agent logs. This change introduces a filter step in case managed storage is used so that the management server only requests kvm agents for stats about volumes that are connected to each kvm host.
Fixes checkstyle issue caused by previous commit 6a511fce40
from PR #3466 where a minor review fix did not address this. Merging this
one to unblock few other PRs after running a local build test.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Add CephSnapshotStrategy to handle RBD revert (rollback) snapshot. In order to support RBD revert (rbd_rollback), this PR adds a CephSnapshotStrategy class to handle Ceph/RBD snapshot actions.
* Add revoke certificates API
* Add background task to sync certificates
* Fix marvin test and revoke certificate
* Fix certificate sent to hypervisor was missing headers
* Fix background task for uploading certificates to hosts
If projects have many resource tags, it will take a long time to list projects.
Remove resource tags information from project_view will fix it the issue.
Fixes#3178
Problem: Users don't know what keys/values to enter for template and VM details.
Root Cause: The feature does not exist that can list possible details and options.
Solution: Based on the possible VM and template details handled by the
codebase, those details were refactored and a list API is introduced
that can return users those details along with possible values. When
users add details now, they will be presented with a list of key details
and their possible options if any.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Fixes#2742
UI Supported ordering VPC Offerings but the API did not have that
support implemented. This makes the change in updateVPCOfferings
and listVPCOfferings API calls, along with necessary database
changes for supporting sorting of VPC Offerings.
Problem: The disk offering change is not reflected in cloud_usage database table.
Root Cause: The resizeVolume API does not publish the volume disk offering change event to the
cloud_usage database table.
Solution: This issue has been fixed by refactoring the resizeVolume API to publish this disk offering change for volumes that either in Allocated or Ready state.
Moves the method that published events for volumes in Ready state from
the VolumeStateListener class to the orchestrateResizeVolume method in
the VolumeApiService.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Problem: Not able to configure a sort order for the zones that are listed in various views in the UI.
Root Cause: There is no mechanism to accept sort key for existing zones or UI widget, that would allow to listing zones in the UI in a certain order.
Solution: The order of zones in listed in various views in the UI can now be configured through the newly added “sort_key” field added for the zone. It can be set using updateZone API by providing “sort_key” parameter for a zone, or by reordering the items in the zones list in the UI. UI has been updated to show ordering controls in zones list view. Database changes include updating table “data_center” by adding “sort_key” column (containing integer values and defaults to zero).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Problem: Admins don’t want to charge for IP address usage on certain (shared) networks.
Root Cause: There is no flag or detail for admins to provide using UI or API when creating networks to specify if they want IP address usage of the network hidden.
Solution: A new boolean hideipaddressusage flag is added to the createNetwork API and a checkbox in the ‘Add guest network’ UI for the root admins to specify if they want the shared network’s IP address usage to be hidden in the listUsageRecords API response. The provided flag is saved as the ‘hideIpAddressUsage’ detail in the cloud.network_details table for the network. For existing (shared) networks, root admins can also specify the same boolean API parameter hideipaddressusage with the updateNetwork API request to configure the behaviour for an existing network. When the detail/flag is true, the IP address usage for the (shared) network is not exported in the listUsageRecords API response. The listNetworks API response will include the details of a network for root admin only. (note usage is still recorded in the usage database but not return by the listUsageRecords API)
The API flag works for any kind of network via the API, but the checkbox is only shown while creating shared networks in the UI.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Fixes tests path from old layout to standard maven in src/test/java/
- Removed duplicate SnapshotManagerImpl at old path `server/src/com...`
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Feature Specification: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95653548
Live storage migration on KVM under these conditions:
From source and destination hosts within the same cluster
From NFS primary storage to NFS cluster-wide primary storage
Source NFS and destination NFS storage mounted on hosts
In order to enable this functionality, database should be updated in order to enable live storage capacibilty for KVM, if previous conditions are met. This is due to existing conflicts between qemu and libvirt versions. This has been tested on CentOS 6 hosts.
Additional notes:
To use this feature set the storage_motion_supported=1 in the hypervisor_capability table for KVM. This is done by default as the feature may not work in some environments, read below.
This feature of online storage+VM migration for KVM will only work with CentOS6 and possible Ubuntu as KVM hosts but not with CentOS7 due to:
https://bugs.centos.org/view.php?id=14026https://bugzilla.redhat.com/show_bug.cgi?id=1219541
On CentOS7 the error we see is: " error: unable to execute QEMU command 'migrate': this feature or command is not currently supported" (reference https://ask.openstack.org/en/question/94186/live-migration-unable-to-execute-qemu-command-migrate/). Reading through various lists looks like the migrate feature with qemu may be available with paid versions of RHEL-EV but not centos7 however this works with CentOS6.
Fix for CentOS 7:
Create repo file on /etc/yum.repos.d/:
[qemu-kvm-rhev]
name=oVirt rebuilds of qemu-kvm-rhev
baseurl=http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/
mirrorlist=http://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-3.5-el7Server
enabled=1
skip_if_unavailable=1
gpgcheck=0
yum install qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64 qemu-kvm-ev-2.3.0-29.1.el7.x86_64 qemu-img-ev-2.3.0-29.1.el7.x86_64
Reboot host
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Problem: When a multi-disk OVA template is uploaded, only the root disk is recognized and VMs deployed using such template only get the root disk provisioned.
Root Cause: The template processor for multi-disk OVA was not used in the template upload processor.
Solution: Added support for local multi-disk OVA template upload. After a multi-disk OVA template is
uploaded, the mechanism that worked on multi-disk OVA templates registered using URL is now also used to discovers and creates data-disk templates in cloud.vm_template table and on the secondary storage.
To enable SSL on SSVMs :
• Upload the certificates like you usually do via the API or UI->Infrastructure tab
• Set the global settings secstorage.encrypt.copy, secstorage.ssl.cert.domain to appropriate values
along with the CPVM ones
• Restart management server (no need to destroy/restart SSVM (or the ssvm agent))
Test cases:
- Upload template and check it creates multi-disk folders on secondary
storage and entries in cloud.vm_template table
- Upload template and kill/shutdown management server. Then restart MS
to check if template sync works
- Copy template across zone of an uploaded template
Signed-off-by: Rohit Yadav rohit.yadav@shapeblue.com
This PR is for deactivating Ehcache in CloudStack since it is not usable. The first commit remove the default RMI cache peering configured for multicast which most of the time cannot work. It also requires to have an interface up which is not always the case while developing offline.
The second commits remove the configuration to activate caching on some DAOs.
Problems
The code in CS does not seem to fit any caching mechanism especially due to the homemade DAO code. The main 3 flaws are the following:
Entities are not expected to be shared
There is quite a lot of code with method calls passing entity IDs value as long, which does some object fetching. Without caching, this behavior will create distinct objects each time an entity with the same ID is fetched. With the cache enabled, the same object will be shared among those methods. It has been seen that it does generate some side effects where code still expected unchanged entity attributes after calling different methods thus generating exception/bugs.
DAO update operations are using search queries
Some part of the code are updating entities based on a search query, therefore the whole cache must be invalidated (see GenericDaoBase: public int update(UpdateBuilder ub, final SearchCriteria<?> sc, Integer rows);).
Entities based on views joining multiple tables
There are quite a lot of entities based on SQL views joining multiple entities in a same object. Enabling caching on those would require a mechanism to link and cross-remove related objects whenever one of the sub-entity is changed.
Final word
Based on the previously discussed points, the best approach IMHO would be to move out of the custom DAO framework in CS and use a well known one (out of scope of this change of course). It will handle caching well and the joins made by the views in the code. It's not an easy change, but it will fix along a lot of issues and add a proven / robust framework to an important part of the code.
Since the CloudStack virtual router was redesigned on version 4.6 it has been observed that the DHCP leases file is not persistent across network operations. This causes conflicts on guest VMs static IPs, causing these static IPs to not be renewed by the DHCP server running on isolated and VPC networks' virtual routers (dnsmasq). On stopping or destroying a VM, its dhcp/dns records are not removed from the virtual router causing ghost effects.
Fixes#3272Fixes#3354
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* DPDK vHost User mode selection
* SQL text field and DPDK classes refactor
* Fix NullPointerException after refactor
* Fix unit test
* Refactor details type
This commit simplifies the generateDestPath method and fixes an issue where an extra file, named as 'null', was created on the target storage pool during VM local storage volume migration. Without this fix, the VM is migrated and there is no data loss; however, 193 KB is allocated for the unused file named as 'null' and the file stays on the target storage.
Added changes for creating service offerings for specified domain(s) and zone(s).
Fixed checkAccess for disk offerings.
Fixed list APIs for disk and service offerings.
UI changes for creating disk, service offerings for specified domain(s) and zone(s).
Signed-off-by: Abhishek Kumar <abhishek.kumar@shapeblue.com>
Allows creating storage offerings associated with particular domain(s) and zone(s). In create disk/storage offfering form UI, a mult-select control has been addded to select desired zone(s) and domain select element has been made multi-select.
createDiskOffering API has been modified to allow passing list of domain and zone IDs with keys domainids and zoneids respectively. These lists are stored in DB in cloud.disk_offering_details table with 'domainids' and 'zoneids' key as string of comma separated list of IDs. Response for create, update and list disk offering APIs will return domainids, domainnames, zoneids and zonenames in details object of offering.
listDiskOfferings API has been modified to allow passing zoneid to return only offerings which are associated with the zone.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Keep connection alive when on maintenance
* Refactor cancel maintenance and unit tests
* Add marvin tests
* Refactor
* Changing the way we get ssh credentials
* Add check on SSH restart and improve marvin tests
Problem: Custom compute offering does not allow setting min and max values for CPU and VRAM for custom VMs.
Root Cause: Custom compute offerings cannot be created with a given range of CPU number and memory instead it allows only fixed values.
Solution: createServiceOffering API has been modified to allow setting a defined range for CPU number and memory. Also, UI form for compute offering creation is provided with a new field named 'compute offering type’ with values - Fixed, Custom Constrained, Custom Constrained. It will allow the creation of compute offerings either with a fixed CPU speed and memory for fixed compute offering, or with a range of CPU number and memory for custom constrained compute offering or without predefined CPU number, CPU speed and memory for custom unconstrained compute offering.
To allow the user to set CPU number, CPU speed and memory during VM deployment, UI form for VM deployment has been modified to provide controls to change these values. These controls are depicted in screenshots below for custom constrained and custom unconstrained compute offering types.
Sample API calls using cmk to create a constrained service offering and deploying a VM using it,
create serviceoffering name=Constrained displaytext=Constrained customized=true mincpunumber=2 maxcpunumber=4 cpuspeed=400 minmemory=256 maxmemory=1024
deploy virtualmachine displayname=ConstrainedVM serviceofferingid=60f3e500-6559-40b2-9a61-2192891c2bd6 templateid=8e0f4a3e-601b-11e9-9df4-a0afbd4a2d60 zoneid=9612a0c6-ed28-4fae-9a48-6eb207af29e3 details[0].cpuNumber=3 details[0].memory=800
Signed-off-by: Abhishek Kumar <abhishek.kumar@shapeblue.com>
- Fixes PR #3146 db cleanup to the correct 4.12->4.13 upgrade path
- Fixes failing unit test due to jdk specific changes after forward
merging
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* skip geting used bytes for volumes that are not in Ready state
* updated log message
* filter snapshots by state backedup
* removed * import
* filter templates by state 'DOWNLOADED'
* refactored getUsedBytes to use O(1) queries
* querying for ready volumes instead filtering in memory
* make listByStoreIdInReadyState more generic ex listByStoreIdAndState
* updated snapshot search criteria for listByStoreIdAndState
* updated template search criteria for listByPoolIdAndState
* fixed typo in search criteria for listByTemplateAndState
* fixed typo in search criteria for templates in listByPoolIdAndState
With this patch b766bf7
we started tracking disks in attaching state so that other attach request can fail gracefully. However this missed the case where disks were in allocated state but attach was requested.
For the use case where users want to attach disk in allocated state but not ready, we need to have allocated-attaching transition as well. We must take care of returning to the original state - allocated or ready - when attach request has completed.
For the use case of unstarted vm's the disk must proceed as follows - "Allocated" -> Attaching -> Allocated. When VM is started, the disk is "created" and pool is assigned. For the use case of started VMs it's more trivial and disk proceeds as follows - Ready -> Attaching -> Ready.
Test this by creating a VM with "startvm=false", create a disk and try attaching it in allocated state. It would give an exception on latest 4.11 but will be fixed on this patch.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Problem: Users are billed for destroyed VMs with VM snapshots because usage records don't get that the VM and VM snapshots are removed.
Root Cause: The destroyVirtualMachine and expungeVirtualMachine APIs were removing VM snapshots but not generating VMSNAPSHOT.DELETE usage event due to which the VM snapshots were not marked as removed in the usage_vmsnapshot table.
Solution: The issue was fixed by emitting the proper usage event for all the VM snapshots of a VM that is destroyed.
* Migrate template to target host if needed.
Fix KVM VM local storage live migration by migrating its template to the
target host if needed.
* Address reviewer and add method that updates the DB template reference
* Remove deprecated Config.PrimaryStorageDownloadWait
* Code formating of @Inject to follow checkstyle
* feature: add libvirt / qemu io bursting
Adds the ability to set bursting features from libvirt / qemu
This allows you to utilize the iops and bytes temporary "burst" mode
introduced with libvirt 2.4 and improved upon with libvirt 2.6.
https://blogs.igalia.com/berto/2016/05/24/io-bursts-with-qemu-2-6/
* updates per rafael et al
* The snapshot.backup.rightafter configuration variable was removed by:
SHA: 6bb0ca2f85
This adds it back, though named snapshot.backup.to.secondary now instead.
This global parameter, once set, will allow you to prevent automatic backups of
snapshots to secondary storage, unless they're actually needed.
Fixes#3096
* updates per review
* api: add command to list management servers
* api: add number of mangement servers in listInfrastructure command
* ui: add block for mangement servers on infra page
* api name resolution method cleanup
Offerings can co-exist where on does provide Security Grouping in the
network, but other guest Networks have no Security Grouping.
In V(X)LAN isolation environments the L2 separation is handled by V(X)LAN
and protection between Instances is handled by Security Grouping.
There are multiple scenarios possible where one network has Security Grouping
enabled because that is required in that network.
In the other network, but in the same zone it could be a choice to have
Security Grouping disabled and allow all traffic to flow.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* - Offline VM and Volume migration on Vmware hypervisor hosts
- Also add VM disk consolidation call on successful VM migrations
* Fix indentation of marvin test file and reformat against PEP8
* * Fix few comment typos
* Refactor debug messages to use String.format() when debug log level is enabled.
* Send list of commands returned by hypervisor Guru instead of explicitly selecting the first one
* Fix unhandled NPE during VM migration
* Revert back to distinct event descriptions for VM to host or storage pool migration
* Reformat test_primary_storage file against PEP-8 and Remove unused imports
* Revert back the deprecation messages in the custom StringUtils class to favour the use of the ApacheUtils
With IPv6 we are not using DHCP to allocate addresses, but using
StateLess Address Auto Configuration (SLAAC) a Instance will calculate
it's own address based on the Router Advertisements send out by the
routers in the network.
This Advertisement contains the IPv6 Subnet in use in that subnet and
allows to calculate the stable Address the Instance will obtain based
on it's MAC Address.
The existing code is 'dead code' as it has been written, but was never
used by any production code.
SLAAC only works properly with subnets of exactly 64-bits large.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
It's incorrect to use the findIncludingRemovedBy and
listIncludingRemovedBy for the common list and find operation.
Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
A corner case was found on 4.11.2 for #2493 leading to an infinite loop in state PrepareForMaintenance
To prevent such cases, in which failed migrations are detected but still running on the host, this feature adds a new cluster setting host.maintenance.retries which is the number of retries before marking the host as ErrorInMaintenance if migration errors persist.
How Has This Been Tested?
- 2 KVM hosts, pick one which has running VMs as H
- Block migrations ports on H to simulate failures on migrations:
iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 49152:49215 -m comment --comment 'test block migrations' iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 16509 -m comment --comment 'test block migrations
- Put host H in Maintenance
- Observe that host is indefinitely in PrepareForMaintenance state (after this fix it goes into ErrorInMaintenance after retrying host.maintenance.retries times)
This PR adds the possibility to select a checkbox for the parameter bypassvlanoverlapcheck to the ajax request createNetwork. The checkbox was added for Guest Network as well as for the L2 Guest Network. For L2 Guest Network a backend check for the existence of the flag bypassvlanoverlapcheck was added.
* Allow KVM VM live migration with ROOT volume on file
* Allow KVM VM live migration with ROOT volume on file
- Add JUnit tests
* Address reviewers and change some variable names to ease future
implementation (developers can easily guess the name and use
autocomplete)
This commit allows deploying VMs with a specific IPv4 address.
DirectPodBasedNetworkGuru does not support requesting a custom
IP-Address while creating a new NIC/Instance, throwing the following
error:
Error 530: Does not support custom ip allocation at this time:
NicProfile[0-0-null-null-null
Unknown macro: { "cserrorcode"}
Some use-cases prefer the ability to request the IPv4 address which the
Instance will get.
This implementation adds unit test cases to cover and it was manually
tested in Basic Networking. I can perform more tests if requested.
With earlier work in Basic Networking and the security group provider IPv6 is
supported and we can allow IPv6 to be supplied in networks with SG enabled.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
The view "service_offering_view" doesn't include removed SOs, as a result when SO is removed, the bug happens. The PR introduces a change for resource calculation changing "service_offering_view" to "service_offering" table which has all service offerings.
Must be fixed in:
4.12
4.11
Fixes: #3009
This adds a new API updateVmwareDc that allows admins to update the
VMware datacenter details of a zone. It also recursively updates
the cluster_details for any username/password updates
as well as updates the url detail in cluster_details table and guid
detail in the host_details table with any newly provided vcenter
domain/ip. The update API assumes that there is only one vCenter per
zone. And, since the username/password for each VMware host could be different
than what gets configured for vcenter at zone level, it does not update the
username/password in host_details.
Previously, one has to manually update the db with any new vcenter details for the zone.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Remove some unused Classes
These classes were deleted because they have no references in our code base. They are not in Spring execution flow nor instantiated with "new":
- com.cloud.agent.api.CheckStateAnswer
- com.cloud.agent.api.StartupVMMAgentCommand
- com.cloud.agent.api.routing.UserDataCommand
- remove from description at
com.cloud.configuration.Config.ExecuteInSequenceNetworkElementCommands
enum
- com.cloud.agent.api.storage.UpgradeDiskCommand
- com.cloud.agent.api.storage.CreatePrivateTemplateCommand
- com.cloud.agent.api.storage.DestroyAnswer
- Note: "FIXME: Should have an DestroyAnswer" at
com.cloud.storage.resource.StoragePoolResource
- com.cloud.agent.api.storage.UpgradeDiskAnswer
- com.cloud.agent.api.storage.ManageVolumeAvailabilityAnswer
- com.cloud.agent.api.storage.ManageVolumeAvailabilityCommand
- com.cloud.exception.UsageServerException
- com.cloud.info.SecStorageVmLoadInfo
- com.cloud.serializer.SerializerHelper
* PR#1448 update description of 'execute.in.sequence.network.element.commands' param
Update description of 'execute.in.sequence.network.element.commands'parameter to reflect an unused command that has been removed. The removed class command is 'UserDataCommand'.
* Add cloud schema to update SQL
This force stops old VRs when performing rolling restart with
cleanup=true. This will ensure that VRs are powered off quickly than
wait longer for the normal ACPI shutdown. During testing, it was found
on VMware where VM stops are slow compared to XenServer and KVM.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This adds a global setting for admins who may not want the rolling
restart of routers or are seeing any issues around it. In future, this
setting may be removed.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes#2719 where private gateway IP might be incorrectly
programmed on a guest network nic. The VR would now check ipassoc
requests by mac addresses than provided nic/device id in case they are
wrong.
The root cause is that the device id information is lost when aggregated
commands are created upon starting of a new VPC VR, without the correct
device id in ip_associations json it mis-programs the VR.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
These boolean-return methods are named as "getXXX".
Other boolean-return methods are named as "isXXX".
Considering there methods will return boolean values, it should be more clear and consistent to rename them as "isXXX".
(rebase #2602 and #2816)
This fixes a regression introduced in #2799, by exporting $TYPE
before the `patch` is called to patch/extract archives for ssvm/cpvm.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
VMware router will be rebooted based on #2794, per current config
the VRs on reboot will go through fsck checks slowing down the deployment
process by few seconds. This will ensure that fsck checks are done
on every 3rd boot of the VR. The `4` is used because 1st boot is done
during the build of systemvmtemplate appliance.
Add upgrade path for a new 4.11.2 systemvmtemplate.
Other changes:
- Add support for XS 7.5 Fixes#2834.
- Reboot VR only if mgmt gw is not pingable on vmware.
- Enable passive ftp by enabling nf_conntrack_helper. This is change in behaviour since linux 4.7
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Add managed storage pool constraints to MigrateWithVolume API method
* Apply mike's suggestions
* Apply Mike's suggestion in a second review
* Mike's suggestions
* Confused bit
* just executeManagedStorageChecks
* Created methods `executeManagedStorageChecksWhenTargetStoragePoolNotProvided` and `executeManagedStorageChecksWhenTargetStoragePoolProvided`
* improve "executeManagedStorageChecksWhenTargetStoragePoolNotProvided"
* Fix "findVolumesThatWereNotMappedByTheUser" method
* Applu Mike's suggestion to improve "createMappingVolumeAndStoragePool" method
* Unit tests to cover modified code
* CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated
Storage pool checker is not being called on resize and migrate volume.
This may lead to allocated percentage of storage above 100%.
Setup:
1 VMware cluster with 2 Hosts.
Executed Steps:
Applied the following global settings:
storage.overprovisioning.factor = 1
pool.storage.allocated.capacity.disablethreshold = 1
pool.storage.capacity.disablethreshold = 1
Restarted management server
Executed Resize and migrate pool and Observed that Storage pool checker is not performed on resizeVolume and migrateVolume.
Result:
Root cause analysis shows storage pool checker is not called when doing migration and resizing.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In integration work for CCS I found that the service call UserVmService.destroyVm(long uuid, boolean expunge) does not honour the expunge flag. I traced it down to the implementation VirtualMachineManagerImpl.destroy(String vmUuid, boolean expunge).
Testing: manual testing so far, testing will pose some crosscutting challanges as the behaviour and implementation are seperated by about five layers of abstraction.
* rename package from "src" to "org.apache.cloudstack.storage.vmsnapshot"
* Remove Empty test class "SnapshotDataFactoryTest"
* Remove redundant imports (importing a class from its own package)
* Fix the problem at #1740 when it loads all snapshots in the primary storage
While checking a problem with @mike-tutkowski at PR #1740, we noticed that the method `org.apache.cloudstack.storage.snapshot.SnapshotDataFactoryImpl.getSnapshots(long, DataStoreRole)` was not loading the snapshots in `snapshot_store_ref` table according to their storage role. Instead, it would return a list of all snapshots (even if they are not in the storage role sent as a parameter) saying that they are in the storage that was sent as parameter.
* Add unit test
* 4.11:
Changed the implementation of isVolumeOnManagedStorage(VolumeInfo) to check if the data store in question is for primary storage (and added a unit test from Daan Hoogland)
vmware: reboot VR after mac updates (#2794)
* Cleaup and code-formatting POM files
* Remove obsolete mycila license-maven-plugin
* Remove obsolete console-proxy/plugin project
* Move console-proxy-rdbconsole under console-proxy parent
* Use correct parent path for rdpconsole
* Order alphabetally items in setnextversion.sh
* Unifiy License header in POMs
* Alphabetic order of modules definition
* Extract all defined versions into parent pom
* Remove obsolete files: version-info.in, configure-info.in
* Remove redundant defaultGoal
* Remove useless checkstyle plugin from checkstyle project
* Order alphabetally items in pom.xml
* Add aditional SPACEs to fix debian build
* Don't execute checkstyle on parent projects
* Use UTF-8 encoding in building checkstyle project
* Extract plugin versions into properties
* Execute PMD plugin on all the projects with -Penablefindbugs
* Upgrade maven plugins to latest version
* Make sure to always look for apache parent pom from repository
* Fix incorrect version grep in debian packaging
* Fix rebase conflicts
* Fix rebase conflicts
* Remove PMD for now to be fixed on another PR
This PR fixes NPE with the provisionCertificateCmd when reconnect is set to True.
Also fixes the following Marvin test failures:
- test_certauthority_root.py
Fixes the version in pom etc. to be consistent with versioning pattern as X.Y.Z.0-SNAPSHOT after a minor release.
Signed-off-by: Khosrow Moossavi <khos2ow@gmail.com>
This ensure that fewer mount points are made on hosts for either
primary storagepools or secondary storagepools.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* 4.11:
comment on unencryption
ui: fix create VPC dialog box failure when zone is SG enabled (#2704)
CLOUDSTACK-10381: Fix password reset / reset ssh key with ConfigDrive
isisnot=
extra message
debug message
imports
update without decrypt doesn't work
set unsensitive attributes as not 'Secure'
remove old config artifacts from update path
Changes in PR #2508 have caused network restart to fail in a Nuage setup,
as the new VR takes the same IP as the old one, and the old VR is still running.
Nuage doesn't support multiple VM's having the same IP.
We delay provisioning the interfaces in VSD until the old VR interface is released.
* Create unit test cases for 'ConfigDriveBuilder' class
* add method 'getProgramToGenerateIso' as suggested by rohit and Daan
* fix encoding for base64 to StandardCharsets.US_ASCII
* fix MockServerTest.testIsMockServerCanUpgradeConnectionToSsl()
This is another method that is causing Jenkins to fail for almost a month
Example: A VM that uses managed storage is stopped. The VM is then started on a different host in the same cluster. The Start operation fails.
To get around this issue, you must either start the VM up on the same host or on a host in a different cluster.
The reason is due to a slightly erroneous check in VolumeOrchestrator.prepare.
To solve this issue, we should be checking if the cluster ID changes, not if the host ID changes.
This introduces a new global setting `vm.configdrive.primarypool.enabled` to toggle creation/hosting of config drive iso files on primary storage, the default will be false causing them to be hosted on secondary storage. The current support is limited from hypervisor resource side and in current implementation limited to `KVM` only. The next big change is that config drive is created at a temporary location by management server and shipped to either KVM or SSVM agent via cmd-answer pattern, the data of which is not logged in logs. This saves us from adding genisoimage dependency on cloudstack-agent pkg.
The APIs to reset ssh public key, password and user-data (via update VM API) requires that VM should be shutdown. Therefore, in the refactoring I removed the case of updation of existing ISO. If there are objections I'll re-put the strategy to detach+attach new config iso as a way of updation. In the refactored implementation, the folder name is changed to lower-cased configdrive. And during VM start, migration or shutdown/removal if primary storage is enable for use, the KVM agent will handle cleanup tasks otherwise SSVM agent will handle them.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When a user shuts down their VM from the guest OS (and VM HA is enabled), the VM just powers itself back on. Our environment is on KVM hosts.
CloudStack does not know the difference between a VM failing or being shutdown from within the guest OS.
This is a major pain point for all our users - especially since they don't pay for VMs when they are shutoff. It is not intuitive for end-users to understand why they can't shutdown VMs from within the guest OS. Especially when they all come from (non-cloudstack) VMware and Hyper-V environments where this is not an issue.
However, if a host fails, we need VM HA to still work.
This PR that creates a configuration option "ha.vm.restart.hostup". With this option set to false, if CloudStack sees a VM shutdown out-of-band, but the host it was on is still online, then it won't power the VM back on. The logic is that since the host is online, it was most likely shutdown from the guest OS.
For when a host actually fails, standard VM HA logic takes over and powers on VMs (if they have VM HA enabled) if the host they were on fails.
If that "ha.vm.restart.hostup" option is true (the default to match current functionality), it works like always, and even in-guest shutdowns of VMs causes CloudStack to power back on the VM.
* Primary Storage count for an account does not decrease when a Data Disk is deleted
When a data disk is created and not attached in a running VM, the "deleteVolume" will not decrement the count for used primary storage in the VMs accounting information. The property that is not being decremented is called "primarystoragetotal"; this information can be retrieved via "listAccounts" API method.
Steps to reproduce this issue:
1 - Create an account, deploy a VM in it
2 - Check the primary storage count for the account with listAccounts API
3 - Create a data disk
4 - Check the primary storage count for the account with listAccounts API
5 - Delete the Data disk
6 - Check the primary storage count for the account with listAccounts API - It is the same as before deleting the data disk (it should not be the same as the value in step 2!)
* formatting and cleanups
* fix imports that were wrongly changed during rebase
This fixes config drive to use VM's user provided host-name instead of
the internal VM instance ID for hostname related config in both
cloudstack and openstack metadata bundled in the ISO.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-9184: Fixes#2631 VMware dvs portgroup autogrowth
This deprecates the vmware.ports.per.dvportgroup global setting.
The vSphere Auto Expand feature (introduced in vSphere 5.0) will take
care of dynamically increasing/decreasing the dvPorts when running out
of distributed ports . But in case of vSphere 4.1/4.0 (If used), as this
feature is not there, the new default value (=> 8) have an impact in the
existing deployments. Action item for vSphere 4.1/4.0: Admin should
modify the global configuration setting "vmware.ports.per.dvportgroup"
from 8 to any number based on their environment because the proposal
default value of 8 would be very less without auto expand feature in
general. The current default value of 256 may not need immediate
modification after deployment, but 8 would be very less which means
admin need to update immediately after upgrade.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a rolling restart of VRs when networks are restarted
with cleanup option for isolated and VPC networks. A make redundant option is
shown for isolated networks now in UI.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Using a hierarchy of database version rather than a flat
list of them. Adding a new schema upgrade path was really
cumbersome and error-prone, because we needed to maintain
a flat map of versions and their corresponding list of
upgrade paths (`DbUpgrade`). Instead we're using a logical
hierarchy structure of versions:
```
DatabaseVersionHierarchy.builder()
.next("4.0.0" , new Upgrade40to41())
.next("4.0.1" , new Upgrade40to41())
.next("4.0.2" , new Upgrade40to41())
.next("4.1.0" , new Upgrade410to420())
.next("4.1.1" , new Upgrade410to420())
.next("4.2.0" , new Upgrade420to421())
...
.next("4.2.1" , new Upgrade421to430())
.next("4.9.3.0" , new Upgrade4930to41000())
.next("4.10.0.0", new Upgrade41000to41100())
.next("4.11.0.0", new Upgrade41100to41110())
.build();
```
With this change, when we need to add a new version upgrade
path, we only need to add it in correct place in the hierarchy
rather than add that in dozens of places in `_upgradeMap`.
Supporting ConfigDrive user data on L2 networks.
Add UI checkbox to create L2 network offering with config drive.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10147 Disabled Xenserver Cluster can still deploy VM's. Added code to skip disabled clusters when selecting a host (#2442)
(cherry picked from commit c3488a51db)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10318: Bug on sorting ACL rules list in chrome (#2478)
(cherry picked from commit 4412563f19)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10284:Creating a snapshot from VM Snapshot generates error if hypervisor is not KVM.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10221: Allow IPv6 when creating a Basic Network (#2397)
Since CloudStack 4.10 Basic Networking supports IPv6 and thus
should be allowed to be specified when creating a network.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
(cherry picked from commit 9733a10ecd)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10214: Unable to remove local primary storage (#2390)
Allow admins to remove primary storage pool.
Cherry-picked from eba2e1d8a1
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* dateutil: constistency of tzdate input and output (#2392)
Signed-off-by: Yoan Blanc <yoan.blanc@exoscale.ch>
Signed-off-by: Daan Hoogland <daan.hoogland@shapeblue.com>
(cherry picked from commit 2ad5202823)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10054:Volume download times out in 3600 seconds (#2244)
(cherry picked from commit bb607d07a9)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* When creating a new account (via domain admin) it is possible to select “root admin” as the role for the new user (#2606)
* create account with domain admin showing 'root admin' role
Domain admins should not be able to assign the role of root admin to new users. Therefore, the role ‘root admin’ (or any other of the same type) should not be visible to domain admins.
* License and formatting
* Break long sentence into multiple lines
* Fix wording of method 'getCurrentAccount'
* fix typo in variable name
* [CLOUDSTACK-10259] Missing float part of secondary storage data in listAccounts
* [CLOUDSTACK-9338] ACS not accounting resources of VMs with custom service offering
ACS is accounting the resources properly when deploying VMs with custom service offerings. However, there are other methods (such as updateResourceCount) that do not execute the resource accounting properly, and these methods update the resource count for an account in the database. Therefore, if a user deploys VMs with custom service offerings, and later this user calls the “updateResourceCount” method, it (the method) will only account for VMs with normal service offerings, and update this as the number of resources used by the account. This will result in a smaller number of resources to be accounted for the given account than the real used value. The problem becomes worse because if the user starts to delete these VMs, it is possible to reach negative values of resources allocated (breaking all of the resource limiting for accounts). This is a very serious attack vector for public cloud providers!
* [CLOUDSTACK-10230] User should not be able to use removed “Guest OS type” (#2404)
* [CLOUDSTACK-10230] User is able to change to “Guest OS type” that has been removed
Users are able to change the OS type of VMs to “Guest OS type” that has been removed. This becomes a security issue when we try to force users to use HVM VMs (Meltdown/Spectre thing). A removed “guest os type” should not be usable by any users in the cloud.
* Remove trailing lines that are breaking build due to checkstyle compliance
* Remove unused imports
* fix classes that were in the wrong folder structure
* Updates to capacity management
This moves db upgrade paths and checks around a new systemvmtemplate
for 4.11.1. The new systemvmtemplate compared to 4.11.0 template
is slightly smaller and has meltdown/spectre fixes among few other
security fixes from Debian and changes to cloud-early-config.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This adds support for XenServer 7.3 and 7.4, and XCP-ng 7.4 version as hypervisor hosts. Fixes#2523.
This also fixes the issue of 4.11 VRs stuck in starting for up-to 10mins, before they come up online.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* [CLOUDSTACK-10323] Allow changing disk offering during volume migration
This is a continuation of work developed on PR #2425 (CLOUDSTACK-10240), which provided root admins an override mechanism to move volumes between storage systems types (local/shared) even when the disk offering would not allow such operation. To complete the work, we will now provide a way for administrators to enter a new disk offering that can reflect the new placement of the volume. We will add an extra parameter to allow the root admin inform a new disk offering for the volume. Therefore, when the volume is being migrated, it will be possible to replace the disk offering to reflect the new placement of the volume.
The API method will have the following parameters:
* storageid (required)
* volumeid (required)
* livemigrate(optional)
* newdiskofferingid (optional) – this is the new parameter
The expected behavior is the following:
* If “newdiskofferingid” is not provided the current behavior is maintained. Override mechanism will also keep working as we have seen so far.
* If the “newdiskofferingid” is provided by the admin, we will execute the following checks
** new disk offering mode (local/shared) must match the target storage mode. If it does not match, an exception will be thrown and the operator will receive a message indicating the problem.
** we will check if the new disk offering tags match the target storage tags. If it does not match, an exception will be thrown and the operator will receive a message indicating the problem.
** check if the target storage has the capacity for the new volume. If it does not have enough space, then an exception is thrown and the operator will receive a message indicating the problem.
** check if the size of the volume is the same as the size of the new disk offering. If it is not the same, we will ALLOW the change of the service offering, and a warning message will be logged.
We execute the change of the Disk offering as soon as the migration of the volume finishes. Therefore, if an error happens during the migration and the volume remains in the original storage system, the disk offering will keep reflecting this situation.
* Code formatting
* Adding a test to cover migration with new disk offering (#4)
* Adding a test to cover migration with new disk offering
* Update test_volumes.py
* Update test_volumes.py
* fix test_11_migrate_volume_and_change_offering
* Fix typo in Java doc
* CLOUDSTACK-10289: Config Drive Metadata: Use VM UUID instead of VM id
* CLOUDSTACK-10288: Config Drive Userdata: support for binary userdata
* CLOUDSTACK-10358: SSH keys are missing on Config Drive disk in some cases
* fix https://issues.apache.org/jira/browse/CLOUDSTACK-10356
* del patch file
* Update ResourceCountDaoImpl.java
* fix some format
* fix code
* fix error message in VolumeOrchestrator
* add check null stmt
* del import unuse class
* use BooleanUtils to check Boolean
* fix error message
* delete unuse function
* delete the deprecated function updateDomainCount
* add error log and throw exception in ProjectManagerImpl.java
* Add stack traces information
* update stack trace info
* update stack trace to make them consistent
* update stack traces
* update stacktraces
* update stacktraces for other similar situations
* fix some other situations
* enhance other situations
* Create database upgrade from 4.11.0.0 to 4.11.1.0 & VMWare version to OS mappings (#2490)
* Create database upgrade from 4.11.0.0 to 4.11.1.0. Add missing VMWare version to OS mapping SQL in the schema-41100to41110.sql.
* add unit test and add 4.11.0.0 entry to _upgradeMap
* upgrade 4.11.1 to 4.12 definition
* applied Nitin's comments
* Create database upgrade from 4.11.0.0 to 4.11.1.0. Add missing VMWare version to OS mapping SQL in the schema-41100to41110.sql.
* add unit test and add 4.11.0.0 entry to _upgradeMap
* CLOUDSTACK-10278 - WIP: need to test this script before create a pull request
* CLOUDSTACK-10278 - added more idempotent stored procs and moved all lines, that end with a semicolon in existing proc, onto one line because com/cloud/utils/db/ScriptRunner.java executes the sql as soon as it reads in line with a semicolon delimeter at the end.
* CLOUDSTACK-10278 - changed more sql statements to call idempotent stored procs
* CLOUDSTACK-10278 - WIP: need to test this script before create a pull request
* CLOUDSTACK-10278 - added more idempotent stored procs and moved all lines, that end with a semicolon in existing proc, onto one line because com/cloud/utils/db/ScriptRunner.java executes the sql as soon as it reads in line with a semicolon delimeter at the end.
* CLOUDSTACK-10278 - changed more sql statements to call idempotent stored procs
Since CloudStack 4.10 Basic Networking supports IPv6 and thus
should be allowed to be specified when creating a network.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
CLOUDSTACK-10341: VR minor fixes to systemvmtemplate (#2468)
CLOUDSTACK-10340: Add setter to hypervisorType in VMInstanceVO (#2504)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.
This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).
This FR introduces two new global settings:
- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
for the agent's background task that checks and switches to agent's
preferred host.
The indirect.agent.lb.algorithm supports following algorithm options:
- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).
Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.
Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
'indirect.agent.lb.algorithm' global settings.
On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.
First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.
From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.
Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.
Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- new flag `-T, --use-timestamp` to use `timestamp` when POM version contains SNAPSHOT
- in the final artifacts (jar) name
- in the final package (rpm, deb) name
- in `/etc/cloudstack-release` file of SystemVMs
- in the Management Server > About dialog
- if there's a "branding" string in the POM version (e.g. `x.y.z.a-NAME[-SNAPSHOT]`),
the branding name will be used in the final generated pacakge name such as following:
- `cloudstack-management-x.y.z.a-NAME.NUMBER.el7.centos.x86_64`
- `cloudstack-management_x.y.z.a-NAME-NUMBER~xenial_all.deb`
- branding string can be overriden with newly added `-b, --brand` flag
- handle the new format version for VR version
- fix long opts (they were broken)
- tolerate and show a warning message for unrecognized flags
- usage help reformat
* Deprecate Version class in favor or CloudStackVersion
* CLOUDSTACK-8855 Improve Error Message for Host Alert State
* [CLOUDSTACK-9846] create column to save the content of alert messages
Remove declaration of throws CloudRuntimeException
I also removed some unused variables and comments left behind
This closes#837
* Isolate a problematic test "smoke/test_certauthority_root"
* Refactored nuage tests
Added simulator support for ConfigDrive
Allow all nuage tests to run against simulator
Refactored nuage tests to remove code duplication
* Move test data from test_data.py to nuage_test_data.py
Nuage test data is now contained in nuage_test_data.py instead of
test_data.py
Removed all nuage test data from nuage_test_data.py
* CLOUD-1252 fixed cleanup of vpc tier network
* Import libVSD into the codebase
* CLOUDSTACK-1253: Volumes are not expunged in simulator
* Fixed some merge issues in test_nuage_vsp_mngd_subnets test
* Implement GetVolumeStatsCommand in Simulator
* Add vspk as marvin nuagevsp dependency, after removing libVSD dependency
* correct libVSD files for license purposes
pep8 pyflakes compliant
* [CLOUDSTACK-10314] Add Text-Field to each ACL Rule
It is interesting to have a text field (e.g. CHAR-256) added to each ACL rule, which allows to enter a "reason" for each FW Rule created. This is valuable for customer documentation, as well as best practice for an evidence towards auditing the system
* Formatting to make check style happy and code clean ups
* [CLOUDSTACK-10240] ACS cannot migrate a volume from local to shared storage.
CloudStack is logically restricting the migration of local storages to shared storage and vice versa. This restriction is a logical one and can be removed for XenServer deployments. Therefore, we will enable migration of volumes between local-shared storages in XenServers independently of their service offering. This will work as an override mechanism to the disk offering used by volumes. If administrators want to migrate local volumes to a shared storage, they should be able to do so (the hypervisor already allows that). The same the other way around.
* Cleanups implemented while working on [CLOUDSTACK-10240]
* Fix test case test_03_migrate_options_storage_tags
The changes applied were:
- When loading hypervisors capabilities we must use "default" instead of nulls
- "Enable" storage migration for simulator hypervisor
- Remove restriction on "ClusterScopeStoragePoolAllocator" to find shared pools
4.10.0.0 users when upgrade to 4.11.0.0 may face db related
discrepancies due to some PRs that got merged without moving their sql
changes to 4.10->4.11 upgrade path. The 4.10.0.0 users can run those
missing sql statements manually and then upgrade to 4.11.0.0, since a
workaround like this is possible this ticket is not marked a blocker. In
4.11.1.0+, we'll move those changes from 4.9.3.0->4.10.0.0 upgrade path
to 4.10.0.0->4.11.0.0 upgrade path. Ideally we should not be doing this,
but this will fix issues for a future 4.10.0.0 user who may want to
upgrade to 4.11.1.0 or 4.12.0.0+.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
ACS is accounting the resources properly when deploying VMs with custom service offerings. However, there are other methods (such as updateResourceCount) that do not execute the resource accounting properly, and these methods update the resource count for an account in the database. Therefore, if a user deploys VMs with custom service offerings, and later this user calls the “updateResourceCount” method, it (the method) will only account for VMs with normal service offerings, and update this as the number of resources used by the account. This will result in a smaller number of resources to be accounted for the given account than the real used value. The problem becomes worse because if the user starts to delete these VMs, it is possible to reach negative values of resources allocated (breaking all of the resource limiting for accounts). This is a very serious attack vector for public cloud providers!
Automate dynamic roles migration for missing props file
- In case commands.properties file is missing, enables dynamic roles.
- Adds a new -D or --default flag to migrate-dynamicroles.py script
to simply update the global setting and use the default role-rule
permissions.
- Add warning message, ask admins to move to dynamic roles during upgrade
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes move refactoring error introduced in #2283
For instance, the class DatadiskTO is supposed to be in com.cloud.agent.api.to package. However, the folder structure it was placed in is com.cloud.agent.api.api.to.
Skip tests for cloud-plugin-hypervisor-ovm3:
For some unknown reason, there are quite a lot of broken test cases for cloud-plugin-hypervisor-ovm3. They might have appeared after some dependency upgrade and was overlooked by the person updating them. I checked them to see if they could be fixed, but these tests are not developed in a clear and clean manner. On top of that, we do not see (at least I) people using OVM3-hypervisor with ACS. Therefore, I decided to skip them.
Identention corrected to use spaces instead of tabs in XML files
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.
- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml
Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
This fixes regression failures seen in Trillian, fixes NPEs that cause Travis related failures.
This also removes the aria2 dependency from rpms that require users to enable/install epel-release.
This finally updates the checksums for 4.11 systemvmtemplates in db upgrade path.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Allowed zone-wide primary storage based on a custom plug-in to be added via the GUI in a KVM-only environment (previously this only worked for XenServer and VMware)
Added support for root disks on managed storage with KVM
Added support for volume snapshots with managed storage on KVM
Enable creating a template directly from a volume (i.e. without having to go through a volume snapshot) on KVM with managed storage
Only allow the resizing of a volume for managed storage on KVM if the volume in question is either not attached to a VM or is attached to a VM in the Stopped state.
Included support for Reinstall VM on KVM with managed storage
Enabled offline migration on KVM from non-managed storage to managed storage and vice versa
Included support for online storage migration on KVM with managed storage (NFS and Ceph to managed storage)
Added support to download (extract) a managed-storage volume to a QCOW2 file
When uploading a file from outside of CloudStack to CloudStack, set the min and max IOPS, if applicable.
Included support for the KVM auto-convergence feature
The compression flag was actually added in version 1.0.3 (1000003) as opposed to version 1.3.0 (1003000) (changed this to reflect the correct version)
On KVM when using iSCSI-based managed storage, if the user shuts a VM down from the guest OS (as opposed to doing so from CloudStack), we need to pass to the KVM agent a list of applicable iSCSI volumes that need to be disconnected.
Added a new Global Setting: kvm.storage.live.migration.wait
For XenServer, added a check to enforce that only volumes from zone-wide managed storage can be storage motioned from a host in one cluster to a host in another cluster (cannot do so at the time being with volumes from cluster-scoped managed storage)
Don’t allow Storage XenMotion on a VM that has any managed-storage volume with one or more snapshots.
Enabled for managed storage with VMware: Template caching, create snapshot, delete snapshot, create volume from snapshot, and create template from snapshot
Added an SIOC API plug-in to support VMware SIOC
When starting a VM that uses managed storage in a cluster other than the one it last was running in, we need to remove the reference to the iSCSI volume from the original cluster.
Added the ability to revert a volume to a snapshot
Enabled cluster-scoped managed storage
Added support for VMware dynamic discovery
IPv4 and IPv6 are two different protocols and the presence of IPv6
in a network does not mean that IPv4 aliases/multiple subnets should
not be configured or supported by the VR.
This if-statement was written almost 5 years ago in a attempt to
add IPv6 support to CloudStack but was never fully implemented.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Extending Config Drive support
* Added support for VMware
* Build configdrive.iso on ssvm
* Added support for VPC and Isolated Networks
* Moved implementation to new Service Provider
* UI fix: add support for urlencoded userdata
* Add support for building systemvm behind a proxy
Co-Authored-By: Raf Smeets <raf.smeets@nuagenetworks.net>
Co-Authored-By: Frank Maximus <frank.maximus@nuagenetworks.net>
Co-Authored-By: Sigert Goeminne <sigert.goeminne@nuagenetworks.net>
During storage expunge domain resource statistics for primary storage space resource counter is not updated for domain. This leads to the situation when domain resource statistics for primary storage is overfilled (statistics only increase but not decrease).
Global scheduled task resourcecount.check.interval > 0 provides a workaround but not fixes the problem truly because when accounts inside domains use primary_storage allocation/deallocation intensively it leads to service block of operation.
NB: Unable to implement marvin tests because it (marvin) places in database weird primary storage volume size of 100 when creating VM from template. It might be a sign of opening a new issue for that bug.
CloudStack volumes and templates are one single virtual disk in case of XenServer/XCP and KVM hypervisors since the files used for templates and volumes are virtual disks (VHD, QCOW2). However, VMware volumes and templates are in OVA format, which are archives that can contain a complete VM including multiple VMDKs and other files such as ISOs. And currently, Cloudstack only supports Template creation based on OVA files containing a single disk. If a user creates a template from a OVA file containing more than 1 disk and launches an instance using this template, only the first disk is attached to the new instance and other disks are ignored.
Similarly with uploaded volumes, attaching an uploaded volume that contains multiple disks to a VM will result in only one VMDK to being attached to the VM.
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Support+OVA+files+containing+multiple+disks
This behavior needs to be improved in VMWare to support OVA files with multiple disks for both uploaded volumes and templates. i.e. If a user creates a template from a OVA file containing more than 1 disk and launches an instance using this template, the first disk should be attached to the new instance as the ROOT disk and volumes should be created based on other VMDK disks in the OVA file and should be attached to the instance.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
Steps to reproduce issue
Deploy a VM
Take snapshot of the root volume
Delete the snapshot
Before the garbage collector has run, shutdown the VM and assign the VM to other user.
When garage collector executes NPE shows in the logs.
This feature allow admins to dedicate a range of public IP addresses to the SSVM and CPVM, such that they can be subject to specific external firewall rules. The option to dedicate a public IP range to the System VMs (SSVM & CPVM) is added to the createVlanIpRange API method and the UI.
Solution:
Global setting 'system.vm.public.ip.reservation.mode.strictness' is added to determine if the use of the system VM reservation is strict (when true) or preferred (false), false by default.
When a range has been dedicated to System VMs, CloudStack should apply IPs from that range to
the public interfaces of the CPVM and the SSVM depending on global setting's value:
If the global setting is set to false: then CloudStack will use any unused and unreserved public IP
addresses for system VMs only when the pool of reserved IPs has been exhausted
If the global setting is set to true: then CloudStack will fail to deploy the system VM when the pool
of reserved IPs has been exhausted, citing the lack of available IPs.
UI Changes
Under Infrastructure -> Zone -> Physical Network -> Public -> IP Ranges, button 'Account' label is refactored to 'Set reservation'.
When that button is clicked, dialog displayed is also refactored, including a new checkbox 'System VMs' which indicates if range should be dedicated for CPVM and SSVM, and a note indicating its usage.
When clicking on button for any created range, UI dialog displayed indicates whether IP range is dedicated for system vms or not.
The first PR(#1176) intended to solve #CLOUDSTACK-9025 was only tackling the problem for CloudStack deployments that use single hypervisor types (restricted to XenServer). Additionally, the lack of information regarding that solution (poor documentation, test cases and description in PRs and Jira ticket) led the code to be removed in #1124 after a long discussion and analysis in #1056. That piece of code seemed logicless (and it was!). It would receive a hostId and then change that hostId for other hostId of the zone without doing any check; it was not even checking the hypervisor and storage in which the host was plugged into.
The problem reported in #CLOUDSTACK-9025 is caused by partial snapshots that are taken in XenServer. This means, we do not take a complete snapshot, but a partial one that contains only the modified data. This requires rebuilding the VHD hierarchy when creating a template out of the snapshot. The point is that the first hostId received is not a hostId, but a system VM ID(SSVM). That is why the code in #1176 fixed the problem for some deployment scenarios, but would cause problems for scenarios where we have multiple hypervisors in the same zone. We need to execute the creation of the VHD that represents the template in the hypervisor, so the VHD chain can be built using the parent links.
This commit changes the method com.cloud.hypervisor.XenServerGuru.getCommandHostDelegation(long, Command). From now on we replace the hostId that is intended to execute the “copy command” that will create the VHD of the template according to some conditions that were already in place. The idea is that starting with XenServer 6.2.0 hotFix ESP1004 we need to execute the command in the hypervisor host and not from the SSVM. Moreover, the method was improved making it readable and understandable; it was also created test cases assuring that from XenServer 6.2.0 hotFix ESP1004 and upward versions we change the hostId that will be used to execute the “copy command”.
Furthermore, we are not selecting a random host from a zone anymore. A new method was introduced in the HostDao called “findHostConnectedToSnapshotStoragePoolToExecuteCommand”, using this method we look for a host that is in the cluster that is using the storage pool where the volume from which the Snaphost is taken of. By doing this, we guarantee that the host that is connected to the primary storage where all of the snapshots parent VHDs are stored is used to create the template.
Consider using Disabled hosts when no Enabled hosts are found
This also closes#2317
This extends work presented on #2048 on which the ability to extend the management range is provided.
Aim
This PR allows separating the management network subnet on which SSVM and CPVM are from the virtual routers management subnet.
Detailed use case
PCI compliance requires that network elements are defined as ‘in scope’ or ‘out of scope’, for compliance purposes. The SSVM and CPVM are both in scope as they allow public HTTP or HTTPS connections. The virtual routers have been defined as out of scope as they have been placed entirely in a firewalled network's segment. However, all of the system VM types share management network. As SSVM and CPVM are both in scope this would bring the virtual routers into scope as well, requiring individual audits of every virtual router. As this is not practical, the ‘management network’ which the SSVM and CPVM are on, and the management network which the virtual routers are on, must be separated by a firewall.
Description
By this feature it is possible to dedicate a created range for SSVM and CPVM (system vms) and provide a VLAN ID for its range.
A new boolean global configuration is added: system.vm.management.ip.reservation.mode.strictness. If enabled, the use of System VMs management IP reservation is strict, preferred if not. Default value is false (preferred).
Strict reservation: System VMs should try to get a private IP from a range marked for system vms. If not available, deployment fails
Preferred reservation: System VMS will try to get a private IP from a range marked for system vms. If not available, IP for range not marked for system vms is taken.
In CLOUDSTACK-9886, we are reading ping.interval and ping.timeout using configdao which involves direct reading of DB. So, replaced it with ConfigKey based approach.
Snapshot on primary storage not cleaned up after Storage migration. This happens in the following scenario:
Steps To Reproduce
Create an instance on the local storage on any host
Create a scheduled snapshot of the volume:
Wait until ACS created the snapshot. ACS is creating a snapshot on local storage and is transferring this snapshot to secondary storage. But the latest snapshot on local storage will stay there. This is as expected.
Migrate the instance to another XenServer host with ACS UI and Storage Live Migration
The Snapshot on the old host on local storage will not be cleaned up and is staying on local storage. So local storage will fill up with unneeded snapshots.
Consider this scenario:
1. User launches a VM from Template and keep it running
2. Admin logins and deleted that template [CloudPlatform does not check existing / running VM etc. while the deletion is done]
3. User resets the VM
4. CloudPlatform fails to star the VM as it cannot find the corresponding template.
It throws error as
java.lang.RuntimeException: Job failed due to exception Resource [Host:11] is unreachable: Host 11: Unable to start instance due to can't find ready template: 209 for data center 1
at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:113)
at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:495)
Client is requesting better handing of this scenario. We need to check existing / running VM's when the template is deleted and warn admin about the possible issue that may occur.
REPRO STEPS
==================
1. Launches a VM from Template and keep it running
2. Now delete that template
3. Reset the VM
4. CloudPlatform fails to star the VM as it cannot find the corresponding template.
EXPECTED BEHAVIOR
==================
Cloud platform should throw some warning message while the template is deleted if that template is being used by existing / running VM's
ACTUAL BEHAVIOR
==================
Cloud platform does not throw as waring etc.
Ran into an issue today where we passed both the "id" and "domainid" parameters into "listAccounts" and received a response despite the account id passed not belonging to the domainid passed.
Allow usage of "domainid" AND "id" in "listAccounts"
- Adding "AccountDoa::findActiveAccountById"
- Adding "AccountDaoImpl::findActiveAccountById"
- Removing seemingly pointless "listForDomain" parameter
- Updating "typeNEQ" value from "5" to "Account.ACCOUNT_TYPE_PROJECT"
(which is "5")
- Only attempt to load domain for "path" query parameter once
"searchForAccountsInternal" input validation logic pseudo-code:
- If "domainid" set, check immediately
- If "id" not set:
- and user is admin and "listall" is true
- if "domainid" not set, use caller domain id
- force "isrecursive" true
- else use caller account id
- Else if "domainid" and "name" set
- verify existence of account and that user has access
- Else:
- if "domainid" not set, locate account by "id"
- else, locate account by "id" and "domainid"
- verify account found and caller has access rights
Python base64 requires that the string is a multiple of 4 characters but
the Apache codec does not. RFC states is not mandatory so the data should
not fail the VR script (vmdata.py).
Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This includes test related fixes and code review fixes based on
reviews from @rafaelweingartner, @marcaurele, @wido and @DaanHoogland.
This also includes VMware disk-resize limitation bug fix based on comments
from @sateesh-chodapuneedi and @priyankparihar.
This also includes the final changes to systemvmtemplate and fixes to
code based on issues found via test failures.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This moves the systevmtemplate migration logic from previous upgrade path
to 4.10.0.0->4.11.0.0 upgrade path.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In default/fresh installations, the guest os type for systemvms with id=15
or Debian 5 (32-bit) can cause memory allocation issues to guest. Using
Other Linux 64-bit as guest OS systemvms get all the allocated RAM. This
avoids OOM related kernel panics for certain VRs such as rVRs, lbvm etc.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes test failures around VMware with the new systemvmtemplate.
In addition:
- Does not skip rVR related test cases for VMware
- Removes rc.local
- Processes unprocessed cmd_line.json
- Fixed NPEs around VMware tests/code
- On VMware, use udevadm to reconfigure nic/mac address than rebooting
- Fix proper acpi shutdown script for faster systemvm shutdowns
- Give at least 256MB of swap for VRs to avoid OOM on VMware
- Fixes smoke tests for environment related failures
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Refactors and simplifies systemvm codebase file structures keeping
the same resultant systemvm.iso packaging
- Password server systemd script and new postinit script that runs
before sshd starts
- Fixes to keepalived and conntrackd config to make rVRs work again
- New /etc/issue featuring ascii based cloudmonkey logo/message and
systemvmtemplate version
- SystemVM python codebase linted and tested. Added pylint/pep to
Travis.
- iptables re-application fixes for non-VR systemvms.
- SystemVM template build fixes.
- Default secondary storage vm service offering boosted to have 2vCPUs
and RAM equal to console proxy.
- Fixes to several marvin based smoke tests, especially rVR related
tests. rVR tests to consider 3*advert_int+skew timeout before status
is checked.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Fixes timezone issue where dates show up as nvalid in UI
- Introduces new event timeline listing/filtering of events
- Several UI improvements to add columns in list views
- Bulk operations support in instance list view to shutdown and destroy
multiple-selected VMs (limitation: after operation, redundant entries
may show up in the list view, refreshing VM list view fixes that)
- Align table thead/tbody to avoid splitting of tables
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Refactor cloud-early-config and make appliance specific scripts
- Make patching work without requiring restart of appliance and remove
postinit script
- Migrate to systemd, speedup booting/loading
- Takes about 5-15s to boot on KVM, and 10-30seconds for VMware and XenServer
- Appliance boots and works on KVM, VMware, XenServer and HyperV
- Update Debian9 ISO url with sha512 checksum
- Speedup console proxy service launch
- Enable additional kernel modules
- Remove unknown ssh key
- Update vhd-util URL as previous URL was down
- Enable sshd by default
- Use hostnamectl to add hostname
- Disable services by default
- Use existing log4j xml, patching not necessary by cloud-early-config
- Several minor fixes and file refactorings, removed dead code/files
- Removes inserv
- Fix dnsmasq config syntax
- Fix haproxy config syntax
- Fix smoke tests and improve performance
- Fix apache pid file path in cloud.monitoring per the new template
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows CloudStack administrators to create layer 2 networks on CloudStack. As these networks are purely layer 2, they don't require IP addresses or Virtual Router, only VLAN is necessary (provided by administrator or assigned by CloudStack). Also, network services should be handled externally, e.g. DNS, DHCP, as they are not provided by L2 networks.
As a consequence, a new Guest Network type is created within CloudStack: L2
Description:
Network offerings and networks support new guest type: L2.
L2 Network offering creation allows administrator to select Specify VLAN or let CloudStack assign it dynamically.
L2 Network creation allows administrator to specify VLAN tag (if network offerings allows it) or simply create network.
VM deployments on L2 networks:
VMs should not IP addresses or any network service
No Virtual Router deployed on network
If Specify VLAN = true for network offering, network gets implemented using a dynamically assigned VLAN
UI changes
A new button is added on Networks tab, available for admins, to allow L2 networks creation
At present, The management IP range can only be expanded under the same subnet. According to existing range, either the last IP can be forward extended or the first IP can be backward extended. But we cannot add an entirely different range from the same subnet. So the expansion of range is subnet bound, which is fixed. But when the range gets exhausted and a user wants to deploy more system VMs, then the operation would fail. The purpose of this feature is to expand the range of management network IPs within the existing subnet. It can also delete and list the IP ranges.
Please refer the FS here: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Expansion+of+Management+IP+Range
MySQLTransactionRollbackException is seen frequently in logs
Root Cause
Attempts to lock rows in the core data access layer of database fails if there is a possibility of deadlock. However Operations are not getting retried in case of deadlock. So introducing retries here
Solution
Operations would be retried after some wait time in case of dead lock exception.
When a template is copied back to zone after it is deleted. deleted field gets reset to null. delete field is added to Search on template zone mapping table to take care of the existing mapping.
ISSUE :
VM start API Job returns success for failed Job
STEPS TO REPRODUCE :
Stop VM instance.
Mark state of associate router as Stopping from DB
Execute startVirtualMachine Api from rest client.
VM will not start but still main job will return the status as SUCCEEDED, whereas sub-job for command VmWorkStart will return status as failed.
Related 86bbe211f2 and CLOUDSTACK-494. Currently we can not have serveral VPCs in one account with different VPN customer gateways configuration per same gateway IP.
This commit adds support for passing IPv6 Addresses and/or Subnets as
Secondary IPs.
This is groundwork for CLOUDSTACK-9853 where IPv6 Subnets have to be
allowed in the Security Groups of Instances to we can add DHCPv6
Prefix Delegation.
Use ; instead of : for separating addresses, otherwise it would cause
problems with IPv6 Addresses.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Depending on the timezone you're running CS (before GMT timezones) you could experience that some jobs are marked as failed since the parent job got a null result despite its child job having successfully done the job. The child job got deleted by the CleanupTask ahead of time, due to a missing datetime conversion to GMT timezone.
Jobs are failing with this message: Job failed with un-handled exception
The fix intends to correct any datetime used in the code that should be using the GMT timezone instead of the local one since all DB datetime should be stored at GMT.
This fixes the following:
- Unchecked thread growth in RemoteEndHostEndPoint
- Potential NPE while finding EP for a storage/scope
Unbounded thread growth can be reproduced with following findings:
- Every unreachable template would produce 6 new threads (in a single
ScheduledExecutorService instance) spaced by 10 seconds
- Every reachable template url without the template would produce 1 new
thread (and one ScheduledExecutorService instance), it errors out quickly without
causing more thread growth.
- Every valid url will produce upto 10 threads as the same ep (endpoint
instance) will be reused to query upload/download (async callback)
progresses.
Every RemoteHostEndPoint instances creates its own
ScheduledExecutorService instance which is why in the jstack dump, we
see several threads that share the prefix RemoteHostEndPoint-{1..10}
(given poolsize is defined as 10, it uses suffixes 1-10).
This fixes the discovered thread leakage with following notes:
- Instead of ScheduledExecutorService instance, a cached pool could be
used instead and was implemented, and with `static` scope to be reused
among other future RemoteHostEndPoint instances.
- It was not clear why we would want to wait when we've Answers returned
from the remote EP, and therefore a scheduled/delayed Runnable was
not required at all for processing answers. ScheduledExecutorService
was therefore not really required, moved to ExecutorService instead.
- Another benefit of using a cached pool is that it will shutdown
threads if they are not used in 60 seconds, and they get re-used for
future runnable submissions.
- Caveat: the executor service is still unbounded, however, the use-case
that this method is used for short jobs to check upload/download
progresses fits the case here.
- Refactored CmdRunner to not use/reference objects from parent class.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Tags field to be included in the listusagerecords response such that it can be used in billing report. E.g.
"tags":[
{"key":"city","value":"Toronto","resourcetype":"UserVm","resourceid":"a0cca906-f985-4b56-ad11-f33e59c4c733","account":"admin","domainid":"dec39eb8-4f81-11e7-8315-067fa0000031","domain":"ROOT"}
,
{"key":"region","value":"canada","resourcetype":"UserVm","resourceid":"a0cca906-f985-4b56-ad11-f33e59c4c733","account":"admin","domainid":"dec39eb8-4f81-11e7-8315-067fa0000031","domain":"ROOT"}
- Migrate to embedded Jetty server.
- Improve ServerDaemon implementation.
- Introduce a new server.properties file for easier configuration.
- Have a single /etc/default/cloudstack-management to configure env.
- Reduce shaded jar file, removing unnecessary dependencies.
- Upgrade to Spring 5.x, upgrade several jar dependencies.
- Does not shade and include mysql-connector, used from classpath instead.
- Upgrade and use bountcastle as a separate un-shaded jar dependency.
- Remove tomcat related configuration and files.
- Have both embedded UI assets in uber jar and separate webapp directory.
- Refactor systemd and init scripts, cleanup packaging.
- Made cloudstack-setup-databases faster, using `urandom`.
- Remove unmaintained distro packagings.
- Moves creation and usage of server keystore in CA manager, this
deprecates the need to create/store cloud.jks in conf folder and
the db.cloud.keyStorePassphrase in db.properties file. This also
remove the need of the --keystore-passphrase in the
cloudstack-setup-encryption script.
- GZip contents dynamically in embedded Jetty
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* VSP ID Caching
* VSP call Statistics
* 5.0 Support
Co-Authored-By: Frank Maximus <frank.maximus@nuagenetworks.net>
Co-Authored-By: Raf Smeets <raf.smeets@nuagenetworks.net>
Allow security policies to apply on port groups:
- Accepts security policies while creating network offering
- Deployed network will have security policies from the network offering
applied on the port group (in vmware environment)
- Global settings as fallback when security policies are not defined for a network
offering
- Default promiscuous mode security policy set to REJECT as it's the default
for standard/default vswitch
Portgroup vlan-trunking options for dvswitch: This allows admins to define
a network with comma separated vlan id and vlan
range such as vlan://200-400,21,30-50 and use the provided vlan range to
configure vlan-trunking for a portgroup in dvswitch based environment.
VLAN overlap checks are performed for:
- isolated network against existing shared and isolated networks
- dedicated vlan ranges for the physical/public network for the zone
- shared network against existing isolated network
Allow shared networks to bypass vlan overlap checks: This allows admins
to create shared networks with a `bypassvlanoverlapcheck` API flag
which when set to 'true' will create a shared network without
performing vlan overlap checks against isolated network and against
the vlans allocated to the datacenter's physical network (vlan ranges).
Notes:
- No vlan-range overlap checks are performed when creating shared networks
- Multiple vlan id/ranges should include the vlan:// scheme prefix
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This implements a CloudStack Prometheus exporter as a plugin, that serves
metrics on a HTTP port.
New global settings:
1. prometheus.exporter.enable - (default: false), Enable the prometheus
exporter plugin, management server restart needed.
2. prometheus.exporter.port - (default: 9595), The prometheus exporter
server port.
3. prometheus.exporter.allowed.ips - (default: 127.0.0.1), List of comma
separated prometheus server ips (with no spaces) that should be allowed to
access the URLs.
The following list of metrics are provided per pop (zone) with the exporter:
• Per host:
o CPU cores: used, total
o CPU usage: used, total (in MHz)
o Memory usage: used, total (in MiBs)
o Total VMs running on the host
• CPU cores: allocated (per zone)
• CPU usage: allocated (per zone, in MHz)
• Memory usage: allocated (per zone, in MiBs)
• Hosts: online, offline, total
• VMs: in all states -- starting, running, stopping, stopped, destroyed,
expunging, migrating, error, unknown
• Volumes: ready, destroyed, total
• Primary Storage Pool: (Disk size) used, allocated, unallocated, total (in GiBs)
• Secondary Storage Pool: (Disk size) used, allocated, unallocated, total (in GiBs)
• Private IPs: allocated, total
• Public IPs: allocated, total
• Shared Network IPs: allocated, total
• VLANs: allocated, total
Additional metrics for the environment:
• Summed domain (level=1) limit for CPU cores
• Summed domain (level=1) limit for memory/ram
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* CLOUDSTACK-10046 digest helper for calculating checksums
* CLOUDSTACK-10046 cleanup unused checksum code
* CLOUDSTACK-10046 padding method proof of concept
* CLOUDSTACK-10046 only compare checksums if old value is valid
* Adding positive and negative tests for md5, sha-1 and sha-256, for xen, vmware and kvm hypervisors.
KVM Results:
Negative Test Passed - Exception Occurred Under template download ['Traceback (most recent call last):\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 189, in test_02_1_create_template_with_checksum_sha1_negative\n self.download(self.apiclient, template.id)\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 260, in download\n template.status)\n', 'Exception: Failed to download template: status - Failed post download script: checksum "{sha-1}bf580a13f791d86acf3449a7b457a91a14389264" didn\'t match the given value, "{sha-1}someInvalidValue"\n']
=== TestName: test_02_1_create_template_with_checksum_sha1_negative | Status : SUCCESS ===
=== TestName: test_02_create_template_with_checksum_sha1 | Status : SUCCESS ===.
Negative Test Passed - Exception Occurred Under template download ['Traceback (most recent call last):\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 203, in test_03_1_create_template_with_checksum_sha256_negative\n self.download(self.apiclient, template.id)\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 260, in download\n template.status)\n', 'Exception: Failed to download template: status - Failed post download script: checksum "{SHA-256}efc03633f2b8f5db08acbcc5dc1be9028572dfd8f1c6c8ea663f0ef94b458c5" didn\'t match the given value, "{SHA-256}someInvalidValue"\n']
=== TestName: test_03_1_create_template_with_checksum_sha256_negative | Status : SUCCESS ===
=== TestName: test_03_create_template_with_checksum_sha256 | Status : SUCCESS ===
Negative Test Passed - Exception Occurred Under template download ['Traceback (most recent call last):\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 217, in test_04_1_create_template_with_checksum_md5_negative\n self.download(self.apiclient, template.id)\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 260, in download\n template.status)\n', 'Exception: Failed to download template: status - Failed post download script: checksum "{md5}ada77653dcf1e59495a9e1ac670ad95f" didn\'t match the given value, "{md5}someInvalidValue"\n']
=== TestName: test_04_1_create_template_with_checksum_md5_negative | Status : SUCCESS ===
=== TestName: test_04_create_template_with_checksum_md5 | Status : SUCCESS ===
* CLOUDSTACK-10046 digest helper for calculating checksums
* CLOUDSTACK-10046 cleanup unused checksum code
* CLOUDSTACK-10046 padding method proof of concept
* CLOUDSTACK-10046 only compare checksums if old value is valid
* Adding positive and negative tests for md5, sha-1 and sha-256, for xen, vmware and kvm hypervisors.
KVM Results:
Negative Test Passed - Exception Occurred Under template download ['Traceback (most recent call last):\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 189, in test_02_1_create_template_with_checksum_sha1_negative\n self.download(self.apiclient, template.id)\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 260, in download\n template.status)\n', 'Exception: Failed to download template: status - Failed post download script: checksum "{sha-1}bf580a13f791d86acf3449a7b457a91a14389264" didn\'t match the given value, "{sha-1}someInvalidValue"\n']
=== TestName: test_02_1_create_template_with_checksum_sha1_negative | Status : SUCCESS ===
=== TestName: test_02_create_template_with_checksum_sha1 | Status : SUCCESS ===.
Negative Test Passed - Exception Occurred Under template download ['Traceback (most recent call last):\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 203, in test_03_1_create_template_with_checksum_sha256_negative\n self.download(self.apiclient, template.id)\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 260, in download\n template.status)\n', 'Exception: Failed to download template: status - Failed post download script: checksum "{SHA-256}efc03633f2b8f5db08acbcc5dc1be9028572dfd8f1c6c8ea663f0ef94b458c5" didn\'t match the given value, "{SHA-256}someInvalidValue"\n']
=== TestName: test_03_1_create_template_with_checksum_sha256_negative | Status : SUCCESS ===
=== TestName: test_03_create_template_with_checksum_sha256 | Status : SUCCESS ===
Negative Test Passed - Exception Occurred Under template download ['Traceback (most recent call last):\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 217, in test_04_1_create_template_with_checksum_md5_negative\n self.download(self.apiclient, template.id)\n', ' File "/Users/bstoyanov/Documents/sb2/cloudstack/test/integration/smoke/test_templates.py", line 260, in download\n template.status)\n', 'Exception: Failed to download template: status - Failed post download script: checksum "{md5}ada77653dcf1e59495a9e1ac670ad95f" didn\'t match the given value, "{md5}someInvalidValue"\n']
=== TestName: test_04_1_create_template_with_checksum_md5_negative | Status : SUCCESS ===
=== TestName: test_04_create_template_with_checksum_md5 | Status : SUCCESS ===
* Adding additional test with no checksum added when registering template
Result:
test_05_create_template_with_no_checksum (integration.smoke.test_templates.TestCreateTemplateWithChecksum) ... === TestName: test_05_create_template_with_no_checksum | Status : SUCCESS ===
ok
----------------------------------------------------------------------
Ran 1 test in 42.320s
OK
* Fixing negative tests exception handling
* Adding tests for ISO checksum validation and fixing a zero prefix failure test in templates
* CLOUDSTACK-10046 padding
* CLOUDSTACK-10046 usability additions
* yet another IDE artifact hindering checkstyle
Co-Authored-By: Prashanth Manthena <prashanth.manthena@nuagenetworks.net>
Co-Authored-By: Sigert Goeminne <sigert.goeminne@nuagenetworks.net>
Bug: https://issues.apache.org/jira/browse/CLOUDSTACK-9832
Detail:
When the VPC offering does not contain VpcVirtualRouter as a SourceNat provider,
then we will not add the interface in the public network to the VpcVR.
CLOUDSTACK-9832: Move isSrcNat check to VpcManager
* CLOUDSTACK-9899 adding a global setting for not checking URLs from the MS
* CLOUDSTACK-9899 refactor HttpTemplateDownloader contructor cleanup
* CLOUDSTACK-9899 refactor HttpTemplateDownloader.download() cleanup
* CLOUDSTACK-9899 add the new config key to configurable
* CLOUDSTACK-9899 refactor download method
* CLOUDSTACK-9899 less verbose setting comment
* CLOUDSTACK-9899 debug message to indicate checking happened
* CLOUDSTACK-9899 typi flase -> false
This causes VM deployment failure on the host that was disabled while adding the storage repository.
In the attachCluster function of the PrimaryDataStoreLifeCycle, we were only selecting hosts that are up and are in enabled state. Here if we select all up hosts, it will populate the DB properly and will fix this issue. Also added a unit test for attachCluster function.
Fix: Block VMOperations if Host in PrepareForMaintenance mode. VM operations (Stop, Reboot, Destroy, Migrate to host) are not allowed when Host in PrepareForMaintenance mode.
Added ability to specify mac in deployVirtualMachine and
addNicToVirtualMachine api endpoints.
Validates mac address to be in the form of:
aa:bb:cc:dd:ee:ff , aa-bb-cc-dd-ee-ff , or aa.bb.cc.dd.ee.ff.
Ensures that mac address is a Unicast mac.
Ensures that the mac address is not already allocated for the
specified network.
This can happen when you stop a VM in one cluster and start a VM in another cluster. When the VM starts in a new cluster, we don't add a new VAG and hence it fails to start. This PR ensures that we call grantAccess to the VM that gets started which will fix the access issue.
Snapshots are not deleted resulting unexpected storage consumption in case of VMware.
Steps to reproduce this issue :
In VMware setup, create a snapshot of volume say Snap1.
After successful creation of snapshot Snap1, create new snapshot of same volume say Snap2.snapshots
While Snap2 is in BackingUp state, delete Snap1.
Snap1 will disappear from Web UI, but when we check secondary storage, files associated with Snap1 still persists even after cleanup job is performed.
In snapshot_store_ref table in DB, Snap1 will be in ready state instead of Destroyed.
Also, in snapshots table, status of Snap1 will be Destroyed but removed column will be null and will never change to the date of snapshot removal.
Fix for this issue :
In VMware, snapshot chain is not maintained, instead full snapshot is taken every time.
So, it makes sense not to assign parent snapshot id for the snapshot. In this way, every snapshot will be individual and can be deleted successfully whenever required.
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Misc changes:
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes issue of enabling dynamic roles based on the global setting
only. This also fixes application of the default role/permissions mapping
on upgrade from 4.8 and previous versions to 4.9+.
Previously, it would make additional check to ensure commands.properties
is not in the classpath however this creates confusion for admins who
may skip/skim through the rn/docs and assume that mere changing the
global settings was not enough.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows changing permission for existing role permissions, as those were static and could not be changed once created. It also provides the ability to change these permissions in the UI using a drop down menu for each permission rule, in which admin can select ‘Allow’ or ‘Deny’ permission.
Changes in the API:
This feature modifies behaviour of updateRolePermission API method:
New optional parameters ‘ruleid’ and ‘permission’ are introduced, they are mutual exclusive to ‘ruleorder’ parameter. This defines two use cases:
Update role permission: ‘ruleid’ and ‘permission’ parameters needed
Update rules order: ‘ruleorder’ parameter needed
Parameter ‘ruleorder’ is now optional
updateRolePermission providing ‘ruleorder’ parameter should be sent via POST
Default value of the account level global config vmsnapshot.expire.interval is -1 that conforms to legacy behaviour. A positive value will expire the VM snapshots for the respective account in that many hours.
CloudStack has several background polling tasks that are spread across
the codebase, the aim of this work is to provide a single manager to
handle submission, execution and handling of background tasks. With
the framework implemented, existing oobm background task has been
refactored to use this manager.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
removed code which nullifies vm_instance_id
Also modified QueryManagerImpl to ignore volume which does not have uuid. This is to avoid duplicate volume listing.
(cherry picked from commit 3cced927c4)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In case of vmware host failure, all the VMs including stopped VMs migrate
to the new host. For the Stopped Vms powerhost gets updated. This was
triggering HandlePowerStateReport which finally calls updatePowerState
updating update_time for the VM. This cause the capacity being reserved
for stopped VMs.
(cherry picked from commit 9d268c8cd5)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
store ref table so builtin template is never downloaded completely
In handleSysTemplateDownload method creating template only if there exists no entry
handleTemplateSync will take care of other scenario
(cherry picked from commit 929595c114)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Update the volume id in volume_store_ref table to newly created volume for migration
(cherry picked from commit 42b89278e9)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
as that snapshot will never be going to use again and also it will fill up primary storage
(cherry picked from commit 336df84f17)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Without this information a NPE might be triggered when starting a VR, SSVM or CP
as this information is read from the 'nics' table and causes a NPE.
During deployment we should set the IPv6 Gateway and CIDR for the NIC object so that
it is persisted to the database.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
(cherry picked from commit f661b631a1)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This commit contains following changes
(1) add CPU CORE information in op_host_capacity
(2) add capacity name in the CapacityResponse
(3) add allocatedCapacity for CPU/MEMORY/CPU CORE for zones
(4) sort CapacityResponse by zonename and CapacityType
CLOUDSTACK-9669:egress destination cidr VR python script changes
CLOUDSTACK-9669:egress destination API and orchestration changes
CLOUDSTACK-9669: Added the ipset package in systemvm template
CLOUDSTACK-9669:Added licence header for new files
CLOUDSTACK-9669: replacing 0.0.0.0/0 with the network cidr
ipset member add with 0.0.0.0/0 fails. So 0.0.0.0/0 replaced with the network cidr.
In source cidr 0.0.0.0/0 is nothing but network cidr.
updated the default egress all cidr with network cidr