If the resource state of hypervisor in "Maintenance" then it
should be considered as offline even though the agent state
is "Up". Since its in maintenance mode, it cant be used to
allocate VM's and hence can't be considered towards resource
allocation
If vm has last host_id specified, cloudstack will try to start vm on it at first.
However, host tag is checked, but guest os preference is not checked.
for new vm, it will be deployed to the preferred host as we expect.
Fixes: #3554 (comment)
* added defensive checks for avoiding NPE and list projects API fix
* list projects with account name provided to not include users in the account in response
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This feature enables the following:
Balanced migration of data objects from source Image store to destination Image store(s)
Complete migration of data
setting an image store to read-only
viewing download progress of templates across all data stores
Related Primate PR: apache/cloudstack-primate#326
* Display acl name in listNetworks response
Display acl name along with its id so that we
dont need to make extra api call to get acl name
* Add since tag
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):
2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
...
at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.
Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
When executing request assignVirtualMachine with null domainID and a valid projectID then a NullPointerException happens at DomainChecker.java.
Command example:
assign virtualmachine virtualmachineid=vmID projectid=projectID account=admin
The NullPointerException that is thrown at DomainChecker is handled at AssignVMCmd.java#L142, resulting in the following log message: Failed to move vm null.
While remove secondary nic from a Running vm, if update the default nic to the secondary nic before the nic is removed, the vm will not have default nic (and cannot be started) when both operations are completed.
It is because UpdateDefaultNic api is not handled as a vm work job (AddNicToVMCmd and RemoveNicFromVMCmd are), it is processed before nic is removed. The result is that secondary nic becomes default nic and got removed.
This PR aims to fix the issue below
Create a network offering for isolated network, services: Dns/Dhcp/Userdata, and enable it
create a isolated network with the new offering
create a vm
check the guest IP of virtual router,
restart network with cleanup
check the guest IP of new virtual router
The IP in step4 and step6 should be the same, but they are different actually.
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
The "hypervisor" field in listvmsnapshot response will
be used in primate to enable/disable creating snapshot
from vm snapshot functionality.
Creating snpashot from vm snapshot will be enabled only if
hypervisor is KVM
This PR adds minor version support when mounting nfs on the SSVM as requested in #2861
The global setting "secstorage.nfs.version" has been changed to use the String data type which allows any minor version to be specified.
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
* Prevent null pointer on listPublicIpAddress cmd
Insert an inner join between data_center table and user_ip_address where data_center.removed field is null
* Remove extra join and add a filter for VLAN removed
When the static route service is not available on the VPC and a static route is created, the static route is created in a revoked state.
Currently, the UI doesn't distinguish between active or revoked static routes.
This PR adds the missing state filter to the list routes command and only lists active routes in the UI.
It also ignores revoked routes when the private gateway is being removed but clears out the inactive routes before the gateway is removed.
Fixes#2908
This PR adds implementation for changing host and storage name, additionally, it fixes a Bug on cluster updateCluster API command. This PRs also enhances the UI by allowing editing field name on Host and Storage pool. Due to the fact that there is no support to editing cluster via UI, it was not edited.
TODO: I will address Host, Cluster, and Storage Pool name edition on CloudStack Primate once the API implementation gets merged.
Details:
Prior to this PR the following API commands did not offer support for updating name:
updateHost (enhancement)
updateStoragePool (enhancement)
Additionally, updateCluster claims to support changing a cluster name (via clustername parameter); however, such operation did not work. (bug)
This fixes issues of virtual size to be twice in case the disk is a
linked-clone root disk. The virtual size of root disk (first in chain)
must be used.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Adding the following fixes so primate can work without issues :
- Adding pagination for listNetworkAclLists
- Adding pagination for listRoles
- Returning mshost uuid rather than msid in list hosts response
- Allowing listVirtualMachinesMetrics to respect hostid
- Fixing return all details in template response
This will purge all the cookies on logout including multiple sessionkey
cookies if passed. On login, this will restrict sessionkey cookie
(httponly) to the / path.
Fixes#4136
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
This change will ensure that B&R APIs are not exported if the feature
is not enabled in any of the zones.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Create a role from any of the existing role, using new parameter roleid in createRole API
- Import a role with its rules, using a new importRole API
- New default roles for Read-Only and Support Admin & User
- No modifications allowed for Default roles
- Cleaned up old NetApp APIs from role_permissions table.
While migrate a vm, in the popup, the host dedicated to other accounts/domains are also 'Suitable" for migration, which is obviously wrong.
The same issue happens with api findHostsForMigration
* Enable unmanaging guest VMs
* Minor fixes
* Fix stop usage event only if VM is not stopped when unmanaging
* Rename unmanaged VMs manager
* Generate netofferingremove usage event if VM is not stopped
* Generate usage event VM snapshot primary off when unmanaging
If we resize a volume of a vm running on a host which is not Up or not Enable, the job will be scheduled to another normal host. Then the volume will be resized by "qemu-img resize" instead of "virsh blockresize", the image might be corrupted after resize.
Adding missing fields in the following APIs
osdisplayname in listVirtualMachines
vpcofferingname in listVpcs
vpcname in listPublicIpAddresses
vpcname in listPrivateGateways
vpcname in listVpnGateways
templatename, podname in listRouters
templatename, podname in listSystemVms
Fixes: #4161
Fixes wrong count in listAffinityGroup API.
API was returning the count of AffinityGroupJoinVO records.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
This PR fixes an issue where an instance fails to deploy due to a null pointer when using an L2 Guest Network with DefaultL2NetworkOfferingConfigDrive on Xenserver. It also fixes migrating an instance to another host.
This has been tested by:
- Creating an L2 Guest network, using DefaultL2NetworkOfferingConfigDrive as the network offering.
- Deploying an instance using the L2 Guest network created.
- Migrating the instance away from the host and back
Repro Steps:
1. Create a VM on host1
2. Make host1 capacity full by deploying multiple VMs
3. Try Dynamic scaling on VM on host1
4. NPE occurs when MS tries to find host to migrate the VM and then scale.
Root cause: VM profile is not initiated properly with serviceoffering before planning for deployment
Solution: Iniate VM profile with serviceoffering and also make sure custom compute parameters are handled
When a VPC is restarted, the networks in the VPC is not restarted, this PR will add the logic to restart the networks in the VPC that needs a restart when the VPC is restarted.
Fixes#3816
BackupSync task would switch between databases to update backup usage
metrics in the cloud_usage.usage_backup table. The current framework
and the usage in ManagedContext causes database connection
(LegacyTransaction) leaks. When the thread runs faster, the issue is
easily reproducible and checking via heap dump analysis or using JMX
MBeans. This fixes by moving the task of backup data updation for
usage data to the usage server by publishing usage events instead of
switching between databases in a local thread while in a
ManagedContextRunnable.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Root cause:
Even though dynamic scaling job is handled in vmworkjob queue which ensures serilizing multiple jobs but the database updating and generating usage events are out of the job queue.
Solution:
Moved all updations into the job queue
Firstly I have tested all the scenarios to check if nothing is broken:
Scaling on a running VM with normal compute offering
Scaling on a stopped VM with normal compute offering
Scaling on a running VM with custom compute offering
Scaling on stopped VM with custom compute offering
Scaling on stopped/running VM between custom compute offering and normal compute offering and combinations among these. Checked if the custom parameters have been populated or deleted accordingly based on the offering to which the VM is scaled
Since this is a corner scenario I could not test the exact point where two usage events are recorded at the same time for two different API calls on same VM.
Fix to pick the max data volumes limit using the actual hypervisor version, instead of "default" version. Use the hypervisor version in the host table when product_version parameter in host details doesn't exist or is empty
Fixes#4101
In the list publicipaddress api call, display the network
name if ip is associated to shared network
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
Allow VR's to be searched using its redundant state
Under Infrastructure -> Virtual Routers -> Search box
we can search using "MASTER", "BACKUP" and this will display
the VR's matching the state.
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
Only admins should be able to search VM by instance name
Customers should not see or serach VM's using the instance name (i-)
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
This update turns on certificate revocation checking for uploaded certificates:
- Updated `CertServiceImpl` to be able to enable revocation checking.
- Introduced a new parameter `ENABLED_REVOCATION_CHECK` for `UploadSslCertCmd`.
- Updated `CertServiceTest`.
Even if no CLRs are specified via `PKIXParameters`, the certificates
themselves may still provide info for revocation checking:
- The AIA extension may contains a URL to the OCSP responder.
- The CLRDP extension contains a URL to the CLR.
Those extensions may need to be explicitly enabled by setting the system properties `com.sun.security.enableAIAcaIssuers` and `com.sun.security.enableCRLDP` to true. See [Java PKI Programmer's Guide](https://docs.oracle.com/en/java/javase/11/security/java-pki-programmers-guide.html).
Using a revoked certificate may be dangerous. One of the most common reasons why a certificate authority (CA) revokes a certificate is that the private key has been compromised. For example, the private key might have been stolen by an adversary.
If I understand correctly, the `CertServiceImpl` bean is used for operations with certificates on a load balancer. In particular, it validates a certificate chain without revocation checking while uploading a certificate. If a compromised revoked certificate is then used by the load balancer, then it may result to compromising TLS connections. However, the attacker has to be able to implement man-in-the-middle attack to compromise the connections. So the attacker has to be quite powerful. Therefore, such an attack is definitely not easy to implement. On the other hand, the impact may be significant because of loss of confidentiality.
This has been discussed on security@cloudstack.apache.org
* 4.13:
Snapshot deletion issues (#3969)
server: Cannot list affinity group if there are hosts dedicated… (#4025)
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
Though VMware does not support security groups, but in a basic zone with VMware and no isolation VMs should be able to deploy.
Root cause:
In case of VMware and basic zone control nic is set to 0.0.0.0 assuming control network will be shared with guest network.
But to have access to VMware instances management/private needs to be assigned to it.
Solution:
Assing a private ip even in case of basic zone VMware.
The listZonesMetrics does not return same keys are listZones as the
default response view is restricted. This fixes that by ensuring that
for root admin full response view is used.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Validate API IOPS normal and maximum read/write values.
Ensures that normal read/write cannot be greater than Maximum
read/write. Additionally, it was added a global settings
'iops.maximum.rate.length'.
'iops.maximum.rate.length' sets the maximum IOPS read/write length
(seconds) accepted; thus, preventing irrealistic values for a disk
offering (e.g. hours or days of burst IOPS). The default value is 0
(zero) and allows any IOPS maximum rate length. Example:
iops.maximum.rate.length = 3600 sets the maximum IOPS length
accepted for a disk offering as 3600 seconds (60 minutes).
* Fix log String.format message from %s to %d
* Add bytes rate validation
* Refactoring to cover Read/Write Bytes and IOPS length validation
* Fix "copy-paste" issue with bytes write rate max length
Change Response view to Full for Admin user
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
As previously described by PR #3929:
If vm has attached ISO, the migration fails with error message "org.libvirt.LibvirtException: Cannot access storage file /mnt/b33e5a1d-e4ea-3465-b6ac-c98dc8ff8af0/207-2-cc5fd717-2d57-3bb3-bcf6-2c930268db6c.iso"