During vmsync if StopCommand (issued as part of PowerOff/PowerMissing report) fails to stop VM (since VM is running on HV),
don't transition VM state to "Stopped" in CS db. Also added a check to throw ConcurrentOperationException if vm state is not
"Running" after start operation.
Changes:
- When there is HA we try to redeploy the affected vm using regular planners and if that fails we retry using the special planner for HA (which skips checking disable threshold)
Now because of job framework the InsufficientCapacittyException gets masked and the special planners are not called. Job framework needs to be fixed to rethrow the correct exception.
- Also the VM Work Job framework is not setting the DeploymentPlanner to the VmWorkJob. So the HA Planner being passed by HAMgr was not getting used.
- Now the job framework sets the planner passed in by any caller of the VM Start operation, to the job
root cause:
when vmsync reports system VM is down, CCP doesn't release the VM resource before starting it.
fix:
make sure cleanup is called for a VM when it is reported as Stopped
Unnecessary exception in MS logs while removing default NIC from VM. Following changes are made:
1. Changed the exception from CloudRuntimeException to InvalidParameterValueExecption.
2. Moved out validation logic to UserVMManagerImpl from VirtualMachineManagerImpl.
3. Handling InvalidParameterValueException from async API calls so that they are not logged as ERROR in MS logs.
(outside cloudstack), the state of the vm is not updated in cloudstack db. The
ping task was not checking for resource (host) status by default. The power
state of the vms is returned as part of the resource status. Fixed the issue by
making sure ping task atleast tries once to get the resource status.
(cherry picked from commit 55b4ead495)
requires storage migration resulting in failure of VM migration. This also improves
the hostsformigration api. Firstly we were trying to list all hosts and then
finding suitable storage pools for all volumes and then we were checking whether
vm migration requires storage migration to that host. Now the process is updated.
We are checking for only those volumes which are not in zone wide primary store.
We are verifying by comparing volumes->poolid->clusterid to host clusterid. If it
uses local or clusterids are different then verifying whether host has suitable
storage pools for the volume of the vm to be migrated too.
Changes:
PodId in which the router should get started was not being saved to the DB due to the VO's setter method not following the setXXX format. So when planner loaded the router from DB, it always got podId as null and that would allow planner to deploy the router in any pod. If the router happens to start in a different pod than the user VM, the Vm fails to start since the Dhcp service check fails.
Fixed the VO's setPodId method, that was causing the DB save operation fail.
List of changes:
1. Created a separate thread pool for handling cron and ping tasks. The size of the pool is based on direct.agent.pool.size. The existing direct agent pool will run all commands other than cron and ping.
2. For normal tasks (generated as part of user/admin API calls), if throttle limit is reached then tasks get queued up for subsequent execution once threads are available.
3. For cron and ping tasks (internally generated by MS like ping, VM sync etc.), if throttle limit is reached then these gets rejected. Since these are internally generated these can be rejected without any issues.
Added a new flag 'checkBeforeCleanup' to StopCommand based on which check is done to see if VM is running in HV host.
If VM is running then in this case it is not stopped and the operation bails out.
Also modified the MS code to call the StopCommand with appropriate value for the flag based on the context.
Currently it is only set to 'true' when called from the new vmsync logic based on powerstate of VM. For rest it
is set to 'false' meaning no change in behaviour.