There are some VM deployment failures happening when multiple VMs are deployed at a time, failures mainly due to NetworkModel code that iterates over all the vlans in the pod. This causes each deployVM thread to hold the global lock on Network longer and cause delays. This delay in turn causes more threads to choose same host and fail since capacity is not available on that host.
Following are some changes required to be done to reduce delays during VM deployments which in turn causes some vm deployment failures when multiple VMs are launched at a time.
In Planner, remove the clusters that do not contain a host with matching service offering tag. This will save some iterations over clusters that dont have matching tagged host
In NetworkModel, do not query the vlans for the pod within the loop. Also optimized the logic to query the ip/ipv6
In DeploymentPlanningManagerImpl, do not process the affinity group if the plan has hostId provided.
Support access to a host’s out-of-band management interface (e.g. IPMI, iLO,
DRAC, etc.) to manage host power operations (on/off etc.) and querying current
power state in CloudStack.
Given the wide range of out-of-band management interfaces such as iLO and iDRA,
the service implementation allows for development of separate drivers as plugins.
This feature comes with a ipmitool based driver that uses the
ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any
out-of-band management interface that support IPMI 2.0.
This feature allows following common use-cases:
- Restarting stalled/failed hosts
- Powering off under-utilised hosts
- Powering on hosts for provisioning or to increase capacity
- Allowing system administrators to see the current power state of the host
For testing this feature `ipmisim` can be used:
https://pypi.python.org/pypi/ipmisim
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This reverts commit cd7218e241, reversing
changes made to f5a7395cc2.
Reason for Revert:
noredist build failed with the below error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project cloud-plugin-hypervisor-vmware: Compilation failure
[ERROR] /home/jenkins/acs/workspace/build-master-noredist/plugins/hypervisors/vmware/src/com/cloud/hypervisor/guru/VMwareGuru.java:[484,12] error: non-static variable logger cannot be referenced from a static context
[ERROR] -> [Help 1]
even the normal build is broken as reported by @koushik-das on dev list
http://markmail.org/message/nngimssuzkj5gpbz
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
CLOUDSTACK-4762 : Enabling VGPU support for XenServer.
This feature is to enable the GPU-passthrough and vGPU functionality,
with the help of this feature, admins/users will be able to leverage
the GPU graphics unit power by deploying a virtul machine with GPU or
vGPU support or by changing the service offering of an existing VM
at any later point of time. There GPU/vGPU enabled VMs are able to run
graphical applications.
For now, this feature is only supported with XenServer hypervisor but
can be extended to add the support of other hypervisors.
Introduction of a new Transaction API that is more consistent with the style
of Spring's transaction managment. The existing Transaction class was renamed
to TransactionLegacy. All of the non-DAO code in the management server has been
updated to use the new Transaction API.
The time increased due to the newly added dedicated resources feature. During regular VM deployment, all dedicated resources are put in avoid list so that they are not considered for deployment.
Now the way to compute the list of dedicated resources is not optimal and performance deteriorates in an environment having lot of pods, clusters and hosts as the logic is to query db. for each suc resource.
The fix is to optimize the logic not to loop through all resources but get the list of each resource type in a single query.
Conflicts:
server/src/com/cloud/deploy/DeploymentPlanningManagerImpl.java
Changes:
- During host deletion, host entry in databse gets removed prior to the disconnect task getting processed.
- This causes the disconnect task to get NPE while trying to do the host state transition
Summary of changes in the fix
- Optimized host scan logic, now instead of iterating over each cluster host scan is done for a batch of clusters
- Made host scan task interval configurable
This feature allows a user to deploy VMs only in the resources dedicated to his account or domain.
1. Resources(Zones, Pods, Clusters or hosts) can be dedicated to an account or domain.
Implemented 12 new APIs to dedicate/list/release resources:
- dedicateZone, listDedicatedZones, releaseDedicatedZone for a Zone.
- dedicatePod, listDedicatedPods, releaseDedicatedPod for a Pod.
- dedicateCluster, listDedicatedClusters, releaseDedicatedCluster for a Cluster
- dedicateHost, listDedicatedHosts, releaseDedicatedHost for a Host.
2. Once a resource(eg. pod) is dedicated to an account, other resources(eg. clusters/hosts) inside that cannot be further dedicated.
3. Once a resource is dedicated to a domain, other resources inside that can be further dedicated to its sub-domain or account.
4. If any resource (eg.cluster) is dedicated to a account/domain, then resources(eg. Pod) above that cannot be dedicated to different accounts/domain (not belonging to the same domain)
5. To use Explicit dedication, user needs to create an Affinity Group of type 'ExplicitDedication'
6. A VM can be deployed with the above affinity group parameter as an input.
7. A new ExplicitDedicationProcessor has been added which will process the affinity group of type 'Explicit Dedication' for a deployment of a VM that demands dedicated resources.
This processor implements the AffinityGroupProcessor adapter. This processor will update the avoid list.
8. A VM requesting dedication will be deployed on dedicatd resources if available with the user account.
9. A VM requesting dedication can also be deployed on the dedicated resources available with the parent domains iff no dedicated resources are available with the current user's account or
domain.
10. A VM (without dedication) can be deployed on shared host but not on dedicated hosts.
11. To modify the dedication, the resource has to be released first.
12. Existing Private zone functionality has been redirected to Explicit dedication of zones.
13. Updated the db upgrade schema script. A new table "dedicated_resources" has been added.
14. Added the right permissions in commands.properties
15. Unit tests: For the new APIs and Service, added unit tests under : plugins/dedicated-resources/test/org/apache/cloudstack/dedicated/DedicatedApiUnitTest.java
16. Marvin Test: To dedicate host, create affinity group, deploy-vm, check if vm is deployed on the dedicated host.