There are some VM deployment failures happening when multiple VMs are deployed at a time, failures mainly due to NetworkModel code that iterates over all the vlans in the pod. This causes each deployVM thread to hold the global lock on Network longer and cause delays. This delay in turn causes more threads to choose same host and fail since capacity is not available on that host.
Following are some changes required to be done to reduce delays during VM deployments which in turn causes some vm deployment failures when multiple VMs are launched at a time.
In Planner, remove the clusters that do not contain a host with matching service offering tag. This will save some iterations over clusters that dont have matching tagged host
In NetworkModel, do not query the vlans for the pod within the loop. Also optimized the logic to query the ip/ipv6
In DeploymentPlanningManagerImpl, do not process the affinity group if the plan has hostId provided.
Bug-ID: CLOUDSTACK-8880: calculate free memory on host before deploying Vm. free memory = total memory - (all vm memory)With memory over-provisioning set to 1, when mgmt server starts VMs in parallel on one host, then the memory allocated on that kvm can be larger than the actual physcial memory of the kvm host.
Fixed by checking free memory on host before starting Vm.
Added test case to check memory usage on Host.
Verified Vm deploy on Host with enough capacity and also without capacity
* pr/847:
Bug-ID: CLOUDSTACK-8880: calculate free memory on host before deploying Vm. free memory = total memory - (all vm memory)
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9794: Unable to attach more than 14 devices to a VMUpdated hardcoded value with max data volumes limit from hypervisor capabilities.
* pr/1953:
CLOUDSTACK-9794: Unable to attach more than 14 devices to a VM
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-5806: add presetup to storage types that support over provisioning
Ideally this should be configurable via global settings
* pr/1958:
CLOUDSTACK-5806: add presetup to storage types that support over provisioning Ideally this should be configurable via global settings
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9698 [VMware] Make hardcorded wait timeout for NIC adapter hotplug as configurableJira
===
CLOUDSTACK-9698 [VMware] Make hardcoded wait timeout for NIC adapter hotplug as configurable
Description
=========
Currently ACS waits for 15 seconds (hard coded) for hot-plugged NIC in VR running on VMware to get detected by guest OS.
The time taken to detect hot plugged NIC in guest OS depends on type of VMware NIC adapter like (E1000, VMXNET3, E1000e etc.)
and guest OS itself. In uncommon scenarios the NIC detection may take longer time than 15 seconds,
in such cases NIC hotplug would be treated as failure which results in VPC tier configuration failure.
Alternatively making the wait timeout for NIC adapter hotplug as configurable will be helpful for admins in such scenarios. This is specific to VR running over VMware hypervisor.
Also in future if VMware introduces new NIC adapter types which may take time to get detected by guest OS, it is good to have flexibility of
configuring the wait timeout as fallback mechanism in such scenarios.
Fix
===
Introduce new configuration parameter (via ConfigKey) "vmware.nic.hotplug.wait.timeout" which is "Wait timeout (milli seconds) for hot plugged NIC of VM to be detected by guest OS." as fallback instead of hard coded timeout, to ensure flexibility for admins given the listed scenarios above.
Signed-off-by: Sateesh Chodapuneedi <sateesh.chodapuneedi@accelerite.com>
* pr/1861:
CLOUDSTACK-9698 Make the wait timeout for NIC adapter hotplug as configurable
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
* 4.9:
moved logrotate from cron.daily to cron.hourly for vpcrouter in cloud-early-config
CLOUDSTACK-9569: propagate global configuration router.aggregation.command.each.timeout to KVM agent
[4.9] CLOUDSTACK-9569: propagate global configuration router.aggregation.command.each.timeout to KVM agentThe router.aggregation.command.each.timeout in global configuration is only applied on new created KVM host.
For existing KVM host, changing the value will not be effective.
We need to propagate the configuration to existing host when cloudstack-agent is connected.
* pr/1856:
CLOUDSTACK-9569: propagate global configuration router.aggregation.command.each.timeout to KVM agent
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9821: Fixed issue in deploying vm in basic zoneFixed issue in deploying vm in basic zone.
There is issue in ipset command with xenserver 6.5. In util.pread2 ipset and -N is passed as single string and it caused the issue in command failure.
util.pread2(['/bin/bash', '-c', 'ipset', '-N ', tmpname , type])
* pr/1991:
CLOUDSTACK-9821: Fixed issue in deploying vm in basic zone
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
Currently ACS waits for 15 seconds (hard coded) for hot-plugged NIC in VR to get detected by guest OS.
The time taken to detect hot plugged NIC in guest OS depends on type of NIC adapter like (E1000, VMXNET3, E1000e etc.)
and guest OS itself. In uncommon scenarios the NIC detection may take longer time than 15 seconds,
in such cases NIC hotplug would be treated as failure which results in VPC tier configuration failure.
Alternatively making the wait timeout for NIC adapter hotplug as configurable will be helpful for admins in such scenarios.
Also in future if VMware introduces new NIC adapter types which may take time to get detected by guest OS, it is good to have flexibility of
configuring the wait timeout as fallback mechanism in such scenarios.
Signed-off-by: Sateesh Chodapuneedi <sateesh.chodapuneedi@accelerite.com>
CLOUDSTACK-9784 : GPU detail not displayed in GPU tab of management server UI.ISSUE
==================
When GPU tab of the host is selected on the management server UI, no GPU detail is displayed.
RESOLUTION
==================
In the javascript file "system.js" while fetching the GPU details, sort functionality in dataprovider is returning value as undefined and hence it throwing an exception. So handled the output as undefined gracefully to avoid exception.
**Screenshot before applying fix :**

**Screenshot after applying fix :**

* pr/1942:
CLOUDSTACK-9784 : GPU detail not displayed in GPU tab of management server UI.
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK 9601: Upgrade: change logic for update path for filesFor going from version A to version D, it uses to run the SQL files in
that order: A -> B -> C -> D -> A-cleanup -> B-cleanup -> C-cleanup ->
D-cleanup. If you had upgraded each version separatively you would have
run A -> A-cleanup -> B -> B-cleanup -> C -> C-cleanup -> D ->
D-cleanup.
This change the logic to follow the same path if you are jumping over
versions.
Signed-off-by: Marc-Aurle Brothier <m@brothier.org>
* pr/1768:
Upgrade: change logic for update path for files
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-8841: Storage XenMotion from XS 6.2 to XS 6.5 fails.Removed Host version check in API. Because
Case 1:(Lower to Higher Version)
Migration from lower version to higher version is valid.
Case 2:(Higher to Lower Version)
In this case system(Host) will not allow.
So no need to check version in API. Additionally, CLOUDSTACK User Interface(UI) does not allow migration between different version of hyper-visors. But sometimes user wants to do migration from Lower to Higher Version. Now he can do it via API.
ACS Link ==>
https://issues.apache.org/jira/browse/CLOUDSTACK-8841
* pr/815:
CLOUDSTACK-8841: Storage XenMotion from XS 6.2 to XS 6.5 fails.
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
Security group ingress/egress issues with xenserver 6.2There is issue with the xenserver 6.2 ipset type nethash. Fixed it by adding nethash for ipset version 6 which is xenserver 6.5. For ipset version 4.x use iptreemap.
1. Tested configuring egress/ingress rules.
2. Tested the traffic for the configured rules from the VM.
* pr/843:
CLOUDSTACK-8871: fixed issue with the xenserver 6.2 ipset nethash
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9660: NPE while destroying volumes during 1000 VMs deploy and destroy tests
NPE is seen as VM destroy and storage cleanup threads try to remove the same root volume. Fix is to handle
only non-root volumes in storage cleanup thread, root volumes will be handled as part of VM destroy.
* pr/1825:
CLOUDSTACK-9660: NPE while destroying volumes during 1000 VMs deploy and destroy tests NPE is seen as VM destroy and storage cleanup threads try to remove the same root volume. Fix is to handle only non-root volumes in storage cleanup thread, root volumes will be handled as part of VM destroy.
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
Fix public IPs not being removed from the VR when deprovisionedThis PR replaces #1706. It does not remove the IP from the database, but it does deprovision the IP correctly from the VR when the public IP is removed.
* pr/1907:
Fix public IPs not being removed from the VR when deprovisioned
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9757: Fixed issue in traffic from additional public subnetAcquire ip from additional public subnet and configure nat on that ip.
After this pick any from that network and access additional public subnet from this vm. Traffic is supposed to go via additional public subnet interface in the VR.
* pr/1922:
CLOUDSTACK-9757: Fixed issue in traffic from additional public subnet
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
* 4.9:
CLOUDSTACK-9746 system-vm: logrotate config causes critical failures
CLOUDSTACK-9788: Fix exception listNetworks with pagesize=0
CLOUDSTACK-8663: Fixed various issues to allow VM snapshots and volume snapshots to exist together
Fix HVM VM restart bug in XenServer
CLOUDSTACK-9746 system-vm: logrotate config causes critical failures* rotate both daily and by size by using maxsize in stead of size
* decrease the max size to 10M for rsyslog files
* remove delaycompress for rsyslog files
* increase rotate to 10 for cloud.log
* pr/1915:
CLOUDSTACK-9746 system-vm: logrotate config causes critical failures
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9363: Fix HVM VM restart bug in XenServerHere is the longer description of the problem:
By default XenServer limits HVM guests to only 4 disks. Two of those are reserved for the ROOT disk (deviceId=0) and CD ROM (device ID=3) which means that we can only attach 2 data disks. This limit however is removed when Xentools is installed on the guest. The information that a guest has Xentools installed and can handle more than 4 disks is stored in the VM metadata on XenServer. When a VM is shut down, Cloudstack removes the VM and all the metadata associated with the VM from XenServer. Now, when you start the VM again, even if it has Xentools installed, it will default to only 4 attachable disks.
Now this problem manifests itself when you have a HVM VM and you stop and start it with more than 2 data disks attached. The VM fails to start and the only way to start the VM is to detach the extra disks and then reattach them after the VM start.
In this fix, I am removing the check which is done before creating a `VBD` which enforces this limit. This will not affect current workflow and will fix the HVM issue.
@koushik-das this is related to the "autodetect" feature that you introduced a while back (https://issues.apache.org/jira/browse/CLOUDSTACK-8826). I would love your review on this fix.
* pr/1829:
Fix HVM VM restart bug in XenServer
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-8663: Fixed various issues to allow VM snapshots and volumesnapshots to exist together
Reverting VM to disk only snapshot in Xenserver corrupts VM
Stale NFS secondary storage on XS leads to volume creation failure from snapshot
Fixed various concerns raised in #672
* pr/1941:
CLOUDSTACK-8663: Fixed various issues to allow VM snapshots and volume snapshots to exist together
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
[CLOUDSTACK-9793] Faster IP in subnet checkThis change removes the conversion from IPNetwork to list in one of the router scripts. This makes the router faster at processing static NAT rules, which can prevent timeouts when attaching or detaching IPs.
With the `list` conversion, it has to potentially check a list of 65536 IP strings multiple times. We assume that the comparison implemented in the IPNetwork is far more efficient. We have seen speed-up from 218 seconds to enable static NAT with 18 IPs on the router to 2 or 3 seconds by removing this cast. This also fixes a potential bug where adding IPs to a router time out because the scripts are taking too long. 218 seconds, for example, is beyond the timeout on the KVM agent for script execution, and then all enableStaticNat operations will fail.
* pr/1948:
CLOUDSTACK-9793: Faster ip in subnet check
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9795: moved logrotate from cron.daily to cron.hourly for vpcrouter moved logrotate from cron.daily to cron.hourly for vpcrouter in cloud-early-config. This brings 'vpcrouter' inline with 'router'. We are having issues with cloud.log not rotating fast enough, which filled up /var/log and ultimately caused the VR to stop functioning in such a way that it prevented new VMs from being deployed.
* pr/1954:
moved logrotate from cron.daily to cron.hourly for vpcrouter in cloud-early-config
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
ipv6: Set IPv6 CIDR and Gateway in 'nic' profileWithout this information a NPE might be triggered when starting a VR, SSVM or CP
as this information is read from the 'nics' table and causes a NPE.
During deployment we should set the IPv6 Gateway and CIDR for the NIC object so that
it is persisted to the database.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* pr/1927:
ipv6: Set IPv6 CIDR and Gateway in 'nic' profile
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-8857 listProjects doesn't return tags vmstopped or vmrunning when their value is zero listProjects doesn't return tags vmstopped or vmrunning when their value is zero
added the the appropriate tags to response.
tested this manually by creating projects, launching vms from project accounts and then listing the projects.
* pr/838:
CLOUDSTACK-8857 listProjects doesn't return tags vmstopped or vmrunning when their value is zero
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-8856 Primary Storage Used(type tag with value 2) related tPrimary Storage Used(type tag with value 2) related tag is not showing in listCapacity api response
* pr/865:
CLOUDSTACK-8856 Primary Storage Used(type tag with value 2) related tag is not showing in listCapacity api response.
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
* 4.9:
CLOUDSTACK-9789: Fix releasing secondary guest IP fails with associated static nat which is actually not used
CLOUDSTACK-9628: Use correct virtualsize with Swift as secondary storage
CLOUDSTACK-9789: Fix releasing secondary guest IP fails with associated static nat which is actually not used
* pr/1947:
CLOUDSTACK-9789: Fix releasing secondary guest IP fails with associated static nat which is actually not used
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9628: Fix Template Size in Swift as Secondary StorageCloudstack incorrectly uses the physical size as the size of the
template. Ideally, the size should refelct the virtual size. This
PR fixes that issue.
* pr/1770:
CLOUDSTACK-9628: Use correct virtualsize with Swift as secondary storage
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-8324: config drive data set/get scripts for the guest vmAdded the guest vm scripts for set/get the vm data, password and ssh keys
* pr/1379:
CLOUDSTACK-8324: updated the mount directory name and kvm virt device
CLOUDSTACK-8324: config drive data set/get scripts for the guest vm
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
CLOUDSTACK-9724: Fixed missing additional public ip on tier network wIn VPC tier network acquire an ip and configure the PF service on it. VR now will have the two ip addresses on the interface.
Now restart the VPC tier network with cleanup option. After router comes up the public interface has only one ip (source nat ip)
Fixed the above issue.
* pr/1885:
CLOUDSTACK-9724: Fixed missing additional public ip on tier network with cleanup
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
* rotate both daily and by size by using maxsize in stead of size
* decrease the max size to 10M for rsyslog files
* remove delaycompress for rsyslog files
* increase rotate to 10 for cloud.log