If the resource state of hypervisor in "Maintenance" then it
should be considered as offline even though the agent state
is "Up". Since its in maintenance mode, it cant be used to
allocate VM's and hence can't be considered towards resource
allocation
If vm has last host_id specified, cloudstack will try to start vm on it at first.
However, host tag is checked, but guest os preference is not checked.
for new vm, it will be deployed to the preferred host as we expect.
Fixes: #3554 (comment)
* added defensive checks for avoiding NPE and list projects API fix
* list projects with account name provided to not include users in the account in response
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This feature enables the following:
Balanced migration of data objects from source Image store to destination Image store(s)
Complete migration of data
setting an image store to read-only
viewing download progress of templates across all data stores
Related Primate PR: apache/cloudstack-primate#326
* Display acl name in listNetworks response
Display acl name along with its id so that we
dont need to make extra api call to get acl name
* Add since tag
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):
2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
...
at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.
Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
When executing request assignVirtualMachine with null domainID and a valid projectID then a NullPointerException happens at DomainChecker.java.
Command example:
assign virtualmachine virtualmachineid=vmID projectid=projectID account=admin
The NullPointerException that is thrown at DomainChecker is handled at AssignVMCmd.java#L142, resulting in the following log message: Failed to move vm null.
While remove secondary nic from a Running vm, if update the default nic to the secondary nic before the nic is removed, the vm will not have default nic (and cannot be started) when both operations are completed.
It is because UpdateDefaultNic api is not handled as a vm work job (AddNicToVMCmd and RemoveNicFromVMCmd are), it is processed before nic is removed. The result is that secondary nic becomes default nic and got removed.
This PR aims to fix the issue below
Create a network offering for isolated network, services: Dns/Dhcp/Userdata, and enable it
create a isolated network with the new offering
create a vm
check the guest IP of virtual router,
restart network with cleanup
check the guest IP of new virtual router
The IP in step4 and step6 should be the same, but they are different actually.
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
The "hypervisor" field in listvmsnapshot response will
be used in primate to enable/disable creating snapshot
from vm snapshot functionality.
Creating snpashot from vm snapshot will be enabled only if
hypervisor is KVM
This PR adds minor version support when mounting nfs on the SSVM as requested in #2861
The global setting "secstorage.nfs.version" has been changed to use the String data type which allows any minor version to be specified.
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
* Prevent null pointer on listPublicIpAddress cmd
Insert an inner join between data_center table and user_ip_address where data_center.removed field is null
* Remove extra join and add a filter for VLAN removed
When the static route service is not available on the VPC and a static route is created, the static route is created in a revoked state.
Currently, the UI doesn't distinguish between active or revoked static routes.
This PR adds the missing state filter to the list routes command and only lists active routes in the UI.
It also ignores revoked routes when the private gateway is being removed but clears out the inactive routes before the gateway is removed.
Fixes#2908
This PR adds implementation for changing host and storage name, additionally, it fixes a Bug on cluster updateCluster API command. This PRs also enhances the UI by allowing editing field name on Host and Storage pool. Due to the fact that there is no support to editing cluster via UI, it was not edited.
TODO: I will address Host, Cluster, and Storage Pool name edition on CloudStack Primate once the API implementation gets merged.
Details:
Prior to this PR the following API commands did not offer support for updating name:
updateHost (enhancement)
updateStoragePool (enhancement)
Additionally, updateCluster claims to support changing a cluster name (via clustername parameter); however, such operation did not work. (bug)
This fixes issues of virtual size to be twice in case the disk is a
linked-clone root disk. The virtual size of root disk (first in chain)
must be used.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Adding the following fixes so primate can work without issues :
- Adding pagination for listNetworkAclLists
- Adding pagination for listRoles
- Returning mshost uuid rather than msid in list hosts response
- Allowing listVirtualMachinesMetrics to respect hostid
- Fixing return all details in template response
This will purge all the cookies on logout including multiple sessionkey
cookies if passed. On login, this will restrict sessionkey cookie
(httponly) to the / path.
Fixes#4136
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This change will ensure that B&R APIs are not exported if the feature
is not enabled in any of the zones.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Create a role from any of the existing role, using new parameter roleid in createRole API
- Import a role with its rules, using a new importRole API
- New default roles for Read-Only and Support Admin & User
- No modifications allowed for Default roles
- Cleaned up old NetApp APIs from role_permissions table.
While migrate a vm, in the popup, the host dedicated to other accounts/domains are also 'Suitable" for migration, which is obviously wrong.
The same issue happens with api findHostsForMigration
* Enable unmanaging guest VMs
* Minor fixes
* Fix stop usage event only if VM is not stopped when unmanaging
* Rename unmanaged VMs manager
* Generate netofferingremove usage event if VM is not stopped
* Generate usage event VM snapshot primary off when unmanaging
If we resize a volume of a vm running on a host which is not Up or not Enable, the job will be scheduled to another normal host. Then the volume will be resized by "qemu-img resize" instead of "virsh blockresize", the image might be corrupted after resize.
Adding missing fields in the following APIs
osdisplayname in listVirtualMachines
vpcofferingname in listVpcs
vpcname in listPublicIpAddresses
vpcname in listPrivateGateways
vpcname in listVpnGateways
templatename, podname in listRouters
templatename, podname in listSystemVms
Fixes: #4161
Fixes wrong count in listAffinityGroup API.
API was returning the count of AffinityGroupJoinVO records.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
This PR fixes an issue where an instance fails to deploy due to a null pointer when using an L2 Guest Network with DefaultL2NetworkOfferingConfigDrive on Xenserver. It also fixes migrating an instance to another host.
This has been tested by:
- Creating an L2 Guest network, using DefaultL2NetworkOfferingConfigDrive as the network offering.
- Deploying an instance using the L2 Guest network created.
- Migrating the instance away from the host and back
Repro Steps:
1. Create a VM on host1
2. Make host1 capacity full by deploying multiple VMs
3. Try Dynamic scaling on VM on host1
4. NPE occurs when MS tries to find host to migrate the VM and then scale.
Root cause: VM profile is not initiated properly with serviceoffering before planning for deployment
Solution: Iniate VM profile with serviceoffering and also make sure custom compute parameters are handled
When a VPC is restarted, the networks in the VPC is not restarted, this PR will add the logic to restart the networks in the VPC that needs a restart when the VPC is restarted.
Fixes#3816
BackupSync task would switch between databases to update backup usage
metrics in the cloud_usage.usage_backup table. The current framework
and the usage in ManagedContext causes database connection
(LegacyTransaction) leaks. When the thread runs faster, the issue is
easily reproducible and checking via heap dump analysis or using JMX
MBeans. This fixes by moving the task of backup data updation for
usage data to the usage server by publishing usage events instead of
switching between databases in a local thread while in a
ManagedContextRunnable.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Root cause:
Even though dynamic scaling job is handled in vmworkjob queue which ensures serilizing multiple jobs but the database updating and generating usage events are out of the job queue.
Solution:
Moved all updations into the job queue
Firstly I have tested all the scenarios to check if nothing is broken:
Scaling on a running VM with normal compute offering
Scaling on a stopped VM with normal compute offering
Scaling on a running VM with custom compute offering
Scaling on stopped VM with custom compute offering
Scaling on stopped/running VM between custom compute offering and normal compute offering and combinations among these. Checked if the custom parameters have been populated or deleted accordingly based on the offering to which the VM is scaled
Since this is a corner scenario I could not test the exact point where two usage events are recorded at the same time for two different API calls on same VM.
Fix to pick the max data volumes limit using the actual hypervisor version, instead of "default" version. Use the hypervisor version in the host table when product_version parameter in host details doesn't exist or is empty
Fixes#4101
In the list publicipaddress api call, display the network
name if ip is associated to shared network
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
Allow VR's to be searched using its redundant state
Under Infrastructure -> Virtual Routers -> Search box
we can search using "MASTER", "BACKUP" and this will display
the VR's matching the state.
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
Only admins should be able to search VM by instance name
Customers should not see or serach VM's using the instance name (i-)
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
This update turns on certificate revocation checking for uploaded certificates:
- Updated `CertServiceImpl` to be able to enable revocation checking.
- Introduced a new parameter `ENABLED_REVOCATION_CHECK` for `UploadSslCertCmd`.
- Updated `CertServiceTest`.
Even if no CLRs are specified via `PKIXParameters`, the certificates
themselves may still provide info for revocation checking:
- The AIA extension may contains a URL to the OCSP responder.
- The CLRDP extension contains a URL to the CLR.
Those extensions may need to be explicitly enabled by setting the system properties `com.sun.security.enableAIAcaIssuers` and `com.sun.security.enableCRLDP` to true. See [Java PKI Programmer's Guide](https://docs.oracle.com/en/java/javase/11/security/java-pki-programmers-guide.html).
Using a revoked certificate may be dangerous. One of the most common reasons why a certificate authority (CA) revokes a certificate is that the private key has been compromised. For example, the private key might have been stolen by an adversary.
If I understand correctly, the `CertServiceImpl` bean is used for operations with certificates on a load balancer. In particular, it validates a certificate chain without revocation checking while uploading a certificate. If a compromised revoked certificate is then used by the load balancer, then it may result to compromising TLS connections. However, the attacker has to be able to implement man-in-the-middle attack to compromise the connections. So the attacker has to be quite powerful. Therefore, such an attack is definitely not easy to implement. On the other hand, the impact may be significant because of loss of confidentiality.
This has been discussed on security@cloudstack.apache.org
* 4.13:
Snapshot deletion issues (#3969)
server: Cannot list affinity group if there are hosts dedicated… (#4025)
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
Though VMware does not support security groups, but in a basic zone with VMware and no isolation VMs should be able to deploy.
Root cause:
In case of VMware and basic zone control nic is set to 0.0.0.0 assuming control network will be shared with guest network.
But to have access to VMware instances management/private needs to be assigned to it.
Solution:
Assing a private ip even in case of basic zone VMware.
The listZonesMetrics does not return same keys are listZones as the
default response view is restricted. This fixes that by ensuring that
for root admin full response view is used.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Validate API IOPS normal and maximum read/write values.
Ensures that normal read/write cannot be greater than Maximum
read/write. Additionally, it was added a global settings
'iops.maximum.rate.length'.
'iops.maximum.rate.length' sets the maximum IOPS read/write length
(seconds) accepted; thus, preventing irrealistic values for a disk
offering (e.g. hours or days of burst IOPS). The default value is 0
(zero) and allows any IOPS maximum rate length. Example:
iops.maximum.rate.length = 3600 sets the maximum IOPS length
accepted for a disk offering as 3600 seconds (60 minutes).
* Fix log String.format message from %s to %d
* Add bytes rate validation
* Refactoring to cover Read/Write Bytes and IOPS length validation
* Fix "copy-paste" issue with bytes write rate max length
Change Response view to Full for Admin user
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
As previously described by PR #3929:
If vm has attached ISO, the migration fails with error message "org.libvirt.LibvirtException: Cannot access storage file /mnt/b33e5a1d-e4ea-3465-b6ac-c98dc8ff8af0/207-2-cc5fd717-2d57-3bb3-bcf6-2c930268db6c.iso"
* Userdata to display static NAT as public ip instead of VR ip
If static nat is enabled on VM then metadata service should return
the static nat instead of gateway IP.
If static not is not enabled then it should return the gateway IP
as the public IP
Test results:
Step to reproduce:
1. Create a vm
2. Ssh to vm.
3. Run the below command inside the vm
wget http://<VR public ip>/latest/meta-data/public-ipv4
Note down the output of the above command
4. Now acquire a new public and enable static NAT on that IP to this vm
5. Now run the same command mentioned above in the VM
This should display the static NAT ip instead of VR public IP
Output:
Before enabling static nat
wget http://10.10.10.40/latest/meta-data/public-ipv4
$ cat public-ipv4
10.10.10.29
After enabling static nat
wget http://10.10.10.40/latest/meta-data/public-ipv4
$ cat public-ipv4
10.11.10.30
* server: apply vm user data when release a public ip
Co-authored-by: Wei Zhou <ustcweizhou@gmail.com>
in 4.13, list sshkeypairs with keyword will ignore the search by name if name is specifed
Fixes an issue in #3098
for example,
(local) > list sshkeypairs name=wei keyword=wei filter=name
{
"count": 3,
"sshkeypair": [
{
"name": "wei3"
},
{
"name": "wei2"
},
{
"name": "wei"
}
]
}
with this patch ,it gives correct result.
(local) > list sshkeypairs name=wei keyword=wei filter=name
{
"count": 1,
"sshkeypair": [
{
"name": "wei"
}
]
}
When both routers of VPC is in MASTER state
then multiple alerts are sent equally to the number of tiers in the VPC.
If the VPC has 3 tiers then 6 alerts will be sent. This is not good
if VPC has more than 10 networks in it.
Instead of checking the router status for all the tiers in the VPC,
just check the status of the router for one tier in a VPC so that
multiple duplicate alerts can be avoided
Currently, the cloudstack sends VM password only to the first
router in the network even if its the backup and return the result.
In some cases the first router will be back up and the second will be master.
Since password server is not running in backup, when the user resets the password,
it is sent to the first router which can be backup.
In that case, the new password is not stored in the password server and users cant log in with a new password.
This change ensures that we send the password to both the routers instead
of the first router so that a new password is stored in the master router.
When we add new guest os, sometimes we missed the records in guest_os_hypervisor.
However, the guest disk model (virtio/ide) is determined by record in the table.
It causes the issue that some new guest os(eg Debian 8/9) uses e1000 instead of virtio nic, and ide disk instead of virtio disk.
To fix the issue permanantly, pass the guest os name in guest_os if the record for kvm is not found in guest_os_hypervisor.
Related commit:7ac9f00eeeb4cd37ec39efeba066e799b581b1a0
By default, once we create a security group we cant change its name.
In this feature, we introduce a new API command "updateSecurityGroup"
which allows us to rename the security group name. Although we can't
change the name of the "default" security group.
When the state of the site to site vpn changes, the check
is done on all the virtual routers including the internal
load balancing vm as well. It is not needed to check the
state for internal load balancing vm
This adds support for JDK11 in CloudStack 4.14+:
- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This implements the systemvm list API response creator to find and use
the host record for a ssvm/cpvm to get the agent status and other
details like last disconnected date and agent version.
Fixes 3875
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Enable PVLAN support on L2 networks
* Fix prevent null pointer on details
* Add marvin tests
* Fixes from comments
* Fix: missing pvlan type on plugniccommand
* Fix checks on network creation for vlans overlap
* Fix remove prefix from secondary vlan id
* Improve checks on physical network for pvlans
* Fix compatibility with previous pvlan creation
* Fix shared networks backwards pvlan compatibility
* Add ui fix for pvlan type not passed to api
* Add check for isolated vlan id overlap
* Include check for dynamic vlan reserved for secondary vlan
* Fix marvin tests errors
* Fix redundant imports
* Skip marvin test for pvlan if dvswitch is not present
* spelling
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
This makes the listSystemVms API to return the host status (agent state),
version and last pinged information. This makes it possible for UIs
to call a single API to get this information.
* server: fix resource count of primary storage if some volumes are Expunged but not removed
Steps to reproduce the issue
(1) create a vm and stop it. check resource count of primary storage
(2) download volume. resource count of primary storage is not changed.
(3) expunge the vm, the volume will be Expunged state as there is a volume snapshot on secondary storage. The resource count of primary storage decreased.
(4) update resource count of the account (or domain), the resource count of primary storage is reset to the value in step (2).
* New feature: Add support to destroy/recover volumes
* Add integration test for volume destroy/recover
* marvin: check resource count of more types
* messages translate to JP
* Update messages for CN
* translate message for NL
* fix two issues per Daan's comments
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
After a local template is uploaded via browser, the generated usage event with type = "TEMPLATE.CREATE" is persisted with the data store ID instead of the zone ID on the zone_id column. The fix will refactor the upload monitor logic, as after the upload completes, it sets the datastore ID on the zone ID column for the created "TEMPLATE.CREATE" usage event. This refactor will query the DB for the data store and will set its associated zone ID in the usage field.
The fix produces the same behaviour as when registering a template from URL.
FIx is also for uploading VOLUME from local/via browser.
When we restart the VPC after destroying the master VR, the backup VR
becomes the master and the site to site connections are not in
the connected state. The passive VPN connection will be in the connected
state but the active VPN connection will be in a disconnected state.
The VM ingestion feature allows CloudStack to discover, on-board, import existing VMs in an infra. The feature currently works only for VMware, with a hypervisor agnostic framework which may be extended for KVM and XenServer in future.
Associating static NAT on IP to VM fails even though the IP is not allocated.
When we try enable static NAT on second IP address to the same VM, the operation fails but the IP address is still allocated in the db and it can't be used to enable static NAT on different VM.
Steps to reproduce the issue:
(1) create a vpc (vpc-001) and a vpc tier (vpc-001-001)
(2) create a vm (vm-001-001) in vpc-001-001
(3) acquire a public ip (ip-1) and enable static nat to vm-001-001,
operation succeeds.
(4) acquire a public ip (ip-2) and enable static nat to vm-001-001,
operation fails but the ip is still assigned to vpc tier vpc-001-001.
Note down the ip address and the id of it.
(5) create another vpc tier vpc-001-002, and vm (vm-001-002) in the tier
(6) enabled ip-2 static nat to vm-001-002, operation should succeed
Add a global setting to disable creating networks with same name in an account
Add a global setting to disable creating network without
mentioning the start and end IPv4 or IPv6 address
By default we can create networks with the same name in the account.
Sometimes we should not create the networks with same name.
This change adds a global setting which prevents creating the network with same name.
The default value is true and set it to false to prevent creating network with same names.
Also its possible to create a shared network without mentioning the
start and the end IPv4 or IPv6 address.
This change adds a global setting which prevents creating a shared
network without specifying the start and the end IPv4 or IPv6 address
Fixes#3783
As reported in the issue, creating volumes from pure snapshot fails with NPE. This is due to order of calls where disk offering access is checked before checking disk offering value. This PR fixes the same.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Fixes#3191
When a template is registered, code stores md5sum of the downloaded file in the vm_template table. However, this downloaded file could be deleted after template installation if it is not an actual (.qcow2, .ova, etc.) file. When the user copies a template using copyTemplate API, the actual template file will be copied across the image stores. Matching checksum for the copied templated file and the stored value from the vm_template table will result in a mismatch.
Changes will set an empty checksum value for the copied template while passing to download service which allows skipping wrong checksum check for the copied while install.
However, this results in a change in checksum value for concerned template entry in vm_template table post template install.
Co-authored-by: dahn <daan.hoogland@gmail.com>
* [CLOUDSTACK-10408] Fix String.replaceAll() to replace() for better performance
* improve with replace char but string
Co-authored-by: Rohit Yadav <rohit@apache.org>
* marvin: check resource count of more types
* New feature: add flag resource.count.running.vms.only to count resource consumption of only running vms
Stopped VMs do not use CPU/RAM actually.
A new global configuration resource.count.running.vms.only is added to determine whether resource (cpu/memory) of only running vms (including Starting/Stopping) will be taken into calculation of resource consumption.
* Add integration test for resource count of only running vms
The List Management Server api returns a list of all the management servers but fails when trying to list by id or name. This ensures that it fetches the details as per the parameters passed.
Fixes: #3833
The metrics API has few properties missing that are present in the corresponding resource.
Fixes#3831
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Rohit Yadav <rohit@apache.org>
Steps to reproduce the issue
(1) create a custom service offering
(2) create a vm with the offering
(3) update vm with displayvm=false, returns an error
(local) > update virtualmachine id=f33fd06a-7643-40d1-833f-272845d9ba09 displayvm=false
Error 530: {"updatevirtualmachineresponse":{"uuidList":[],"errorcode":530,"cserrorcode":9999}}
When start a vm or migrate a vm (away from a host in host maintenance), cloudstack will check capacity of all hosts and choose one. If there are hundreds of hosts on the platform, it will take some seconds. When cloudstack choose a host and start/migrate vm to it, the resource consumption of the host might have been changed. This normally happens when we start/migrate multiple vms.
It would be better to double check the host capacity when start vm on a host.
This PR includes the fix for cpucore capacity when start/migrate a vm.
When we calculate a resource consumption of a host, we need to take the vms in following states into calculation: Running, Starting, Stopping, Migrating (to the host), and vms are Migrating from the host. Because, when stop a vm, the resource on host will be released when vm is stopped. When migrate a vm, the resource on destination host will be increased before migration starts, and resource on source host will be decreased after migraiton succeeds.
In cloudstack, there is a task named CapacityChecked which run every 5 minutes (capacity.check.period =300000 ms by default). It recalculates capacity of all hosts. However, it takes only vms in Running and Starting into consideration. We have faced some issues in host maintenance due to it.
Steps to reproduce the issue
(1) migrate N vms from host A to host B, cpu/ram resource increases before the migration.
(2) capacity check recalculate the capacity of hosts. used capacity of Host B will be reset to original value (not including the vms in Migrating).
(3) migrate some more vms from other host to host B, the migrations are allowed by cloudstack (because used capacity is incorrect). If the actual used memory exceed the physical memory on the host, there might be some critical issues (for example, libvirt dies)
Steps to reproduce the issue
(1) create an account (test)
(2) create a vm with the account (test)
(3) login with admin, and upgrade the vm to another offering
(4) the resource count (cpu,memory) of admin increases, not the account (test).
* pass domainid for list users
* passing arg in wizzard
* adding userfilter to list ldap users and usersource to response
port of list ldap users tests to java
* assertion of differnt junit ldap methods
* broken test for directory server (and others)
* embedded context loading
* add user and query test
* UI: filter options passing filter and domain and onchange trigger
* disable tests that only work in ide
prereqs for domain-linkage fixed
move trigger to the right location in code
trigger for changing domain
* logging, comments and refactor
implement search users per domain
retrieve appropriate list of users to filter
get domain specific ldap provider
* query cloudstack users with now db filter
* recreate ldap linked account should succeed
* disable auto import users that don't exist
* ui choice and text
* import filter and potential remove from list bug fixed
* fix rights for domain admins
* list only member of linked groups not of principle group
* Do not show ldap user filter if not importing from ldap
do not delete un-needed items from dialog permanently
delete from temp object not from global one
* localdomain should not filterout users not imported from ldap
* several types of authentication handling errors fixed and unit tested
* conflict in output name
* add conflict source field to generic import dialog
* replace reflextion by enum member call
* conflict is now called conflict 🎉
* Update message when keys are NOT being injected
* Correct the message after injectkeys.ssh is done
* Update message to a more meaningful one, since sometimes nothing is injected
* Update other 2
* typo
* * Complete API implementation
* Complete UI integration
* Complete marvin test
* Complete Secondary storage GC background task
* improve UI labels
* slight reword and add another missing description
* improve download message clarity
* Address comments
* multiple fixes and cleanups
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* fix more bugs, let it return ip rule list in another log file
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* fix missing iprule bug
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* add support for ARCHIVE type of object to be linked/setup on secstorage
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Fix retrieving files for Xenserver
* Update get_diagnostics_files.py
* Fix bug where executable scripts weren't handled
* Fixed error on script cmd generation
* Do not filter name for log files as it would override similar prefix script names
* Addressed code review comments
* log error instead of printstacktrace
* Treat script as executable and shell script
* Check missing script name case and write to output instead of catching exception
* Use shell = true instead of shlex to support any executable
* fix xenserver bug
* don't set dir permission for vmware
* Code review comments - refactoring
* Add check for possible NPE
* Remove unused imoprt after rebase
* Add better description for configs
Co-authored-by: Nicolas Vazquez <nicovazquez90@gmail.com>
Co-authored-by: Rohit Yadav <rohit@apache.org>
Co-authored-by: Anurag Awasthi <anurag.awasthi@shapeblue.com>
* Suqash commits to a single commit and rebase against master
Update marvin tests to use white list
* * Fix marvin test failure
* Add new marvin negative tests cases
* Remove hard-coded hypervisor types in marvin tests
* Fix build error after rebase and add hugepagesless
* Fix readability of python code
* Fix failing test
* Adding cleanup of vms for negative tests
* Bug fixes - change config checks properly and block extraconfig in details
* Trim to compare the keys
* CR comments
* Don't skip extraconfig without exception
Co-authored-by: Boris Stoyanov - a.k.a Bobby <bss.stoyanov@gmail.com>
Currently while creating ingress/egress rule for a security group,
we can specify only TCP/UDP/ICMP. Sometimes we need to add rules
for different protocol number or rules for all the above three
mentioned protocols.
In this new feature users can specify the protocol number or select
"ALL" option which will apply rules for TCP/UDP/ICMP
Currently in cloudstack, when we click on "Acquire New Ip", it will
randomly acquire IP from the pool. With this enhancement, it is
possible to select the IP from the drop down IP list of that network.
Same thing applies for a VPC as well.
* 4.13:
Added zone check for attach iso (#3755)
config: add isdynamic flag in configuration response (#3729)
filter hosts to query on zone wide storage (#3733)
convert protocal names to be found as labels (#3747)
Once again allow a VM to be on multiple networks from VPCs (#3754)
create template from snapshot regression (partly reverted) (#3767)
* create template from snapshot regression (partly reverted) (#3767)
* Once again allow a VM to be on multiple networks from VPCs (#3754)
to once again allow a VM to be on multiple networks from VPCs
* convert protocal names to be found as labels (#3747)
* convert protocal names to be found as labels
* format
* filter hosts to query on zone wide storage (#3733)
* config: add isdynamic flag in configuration response (#3729)
Co-authored-by: Wei Zhou <ustcweizhou@gmail.com>
* Service layer changes for new way of tracking maintanence progress
* Fixes after offline code review
* Fix marvin tests
* Change state name and add documentation
* Fix test
* Fix and add more unit tests for different caseS
* Fix and enhance Marvin Tests
* Fixes for corner cases
* More fixes and logging
* UI fixes
* Some minor changes and reducing VMs on host for more contained tests
* Fixed ssh client auth problem causing test failure
* Code review changes + fixes + some more logging
* Fix flaky tests by adding delays between host states
* Added fetching only enabled hosts for tests
* Make port blocking KVM specific and refactor to handle failure
* Make failing migrations due to tagged host instead of port blocking
* Added additional check for migrating VMs
* Refactor to use single place for methods checking maintenance states
* Add missing HA config keys
* Change time value to seconds
* Change Integer to Long
* Using ConfigKey defaultValue
* Do some code refactoring
* Simplify code
* Avgload (#2)
* Adding avgload for kvm
* Fix coding style issue
* Add getter/setter
* Fix several small errors
* Add override
* Uncomment getAverageLoad
* Override getAverageLoad()
* Checkstyle bug?
* Delete trailing spaces
* Renaming function
* Change interface to match
* Rename method in GetHostStatsAnswer
* Change method call name
* Convert double to long
* Remove trailing whitespace
* Change names around
* Make load visible to return it
* Parse string to double
* Change Long to Double
* Fix getter
* Unify naming to cpuloadaverage
* Change cpuloadaverage String to Double in listHostsMetrics
Remove some unnecessary whitespaces
* Add CPU_LOAD_AVERAGE to ApiConstants
After commit fbf488497f, admin need to specify an ipv4 or ipv6 addresses when add IP to nic which breaks backward compatibity. If IP is not specified, a IPv4 address should be returned.
KVM is supported on arm64 Linux (https://www.linux-kvm.org/page/Processor_support#ARM:).
For a small (IoT) platform such as the new Raspberry Pi 4 that uses armv8 processor
(cortex-a72) it's possible to run Linux host with `/dev/kvm`
accleration. This adds support for IoT IaaS in CloudStack.
This PR is from a fun weekend project where:
- I set up a Raspberry Pi 4 - 4GB RAM model with 4 CPU cores @ 1.5Ghz, 128GB SD samsung evo plus card
- Installed Ubuntu 19.10 raspi3 base image: http://cdimage.ubuntu.com/releases/19.10/release/ubuntu-19.10-preinstalled-server-arm64+raspi3.img.xz
- Build a custom Linux 5.3 kernel with KVM enabled, deb here: http://dl.rohityadav.cloud/cloudstack-rpi/kernel-19.10/ and install the linux-image and linux-module
- Then install/setup CloudStack on it (fix some issues around jna, by manually installing newer libjna-java to /usr/share/cloudstack-agent/lib)
- Since the host processor is not x86_64, I had to build a new arm64 (or aarch64) systemvmtemplate: http://dl.rohityadav.cloud/cloudstack-rpi/systemvmtemplate/
I could finally get a 4.13 CloudStack + Adv zone/networking to run on it
and deployed a KVM based Ubuntu 19.10 environment and NFS storage.
Deployed a test vm with isolated network, VR works as expected. Console
proxy works as well, for this tested against arm64 openstack Debian 9/10
templates.
I raised the issue of enabling KVM in upstream Ubuntu arm64 build: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1783961
Ubuntu kernel team has come back and future arm64 releases may have
KVM enabled by default.
Limitation: on my aarch64 env, it did not support IDE, therefore all
default bus type for volumes are SCSI by default. With VIRTIO it fails
sometimes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* server: Do NOT cleanup dhcp and dns when stop a vm
According comment in PR #3608, dhcp and dns entries are cleaned up only when a VM is expunged.
Revert part of commit 8fb388e931.
* server: cleanup dns/dhcp entries in removeNic instead of finalizeExpunge
This fixes a behaviour to not cleanup DHCP and DNS rules for NICs of a
VM in the VR when it is stopped, but instead when VM is expunged because
stopped VMs in CloudStack still retain the IPs and records.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In case of null guest OS found for a template, don't fail prioritisation
completely (could still work based on HVM etc).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Refactor: Cleanup duplicate code
Make use of Java 8 default implementation in interfaces,
to remove code duplication between XxxCmd and XxxCmdAsAdmin.
Refactor checkFormat by pre-calculating the supported
extensions. Also make use of this in ImageStoreUtil.
Makes it easier to add new file and compression formats.