cloudstack/server/src/com/cloud/ha
prachi 97bdb58b6d Bug 11404 - VM was in Running state, had null for a pod_id, basically didnt allow creation of subsequent vm's
Reviewed-by: Alex

Changes:
- When management server starts, it goes through all the pending work items from op_it_work table and schedules HA work for each. It used to mark each item as done. Instead we should keep the item as pending and let it get marked as Done after the HA work is done.
- Changes in VirtualMachineMgr::advanceStop() :
a) if we find a VM with null hostId, we stop the VM only if it is forced stopped.
b) if VM state transition to Stopping fails,for state Starting and Migrating we try to find the pending work item and then do cleanup the VM. In case state is Stopping we can cleanup directly.
c) We proceed releasing all resources only if state transitioned to 'Stopping'.
- Changes in HA:
a) Depend on VirtualMachineMgr::advanceStop() in case host is not found to do VM cleanup
- When Vm state between mgmt server and agent syncs from starting -> running, mark any pending work item as done.
2011-09-15 18:47:05 -07:00
..
dao bug 10094: The problem was we added code that won't add any more ha work items if it already has one. However, that is wrong. HA Manager stores the existing snapshot of the VM state machine. Before working on HA for a VM, it checks to see if that snapshot has been changed. So by not scheduling HA work, we've effectively made HA not work under multi-failure situations. I've fixed by removing that code and instead at the time of performing HA, do a quick check to see if there are pwork underway for the same VM and work scheduled in the future for that VM. If there are work scheduled in the future, then we simply cancel the current work. If there are already work underway, then we retry again in 1 minute. 2011-06-12 09:18:21 -07:00
AbstractInvestigatorImpl.java 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute 2011-09-07 19:18:36 -07:00
CheckOnAgentInvestigator.java 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute 2011-09-07 19:18:36 -07:00
FenceBuilder.java Moved DAO to server 2010-11-22 07:40:41 -08:00
HaWorkVO.java added cluster awareness to vm start/stop 2011-02-11 17:03:04 -08:00
HighAvailabilityManager.java HA: no need to investigate why vm was stopped on host when host is being Dicsonnected with investigate=false option 2011-04-22 13:38:25 -07:00
HighAvailabilityManagerExtImpl.java full opensource 2011-08-23 19:23:49 -07:00
HighAvailabilityManagerImpl.java Bug 11404 - VM was in Running state, had null for a pod_id, basically didnt allow creation of subsequent vm's 2011-09-15 18:47:05 -07:00
Investigator.java Moved DAO to server 2010-11-22 07:40:41 -08:00
KVMFencer.java Add license header to files 2011-04-14 11:23:14 -07:00
ManagementIPSystemVMInvestigator.java Propagating 1345af2a0e84684a804bde5b281c30df72f148a0 2011-05-10 05:52:39 -07:00
RecreatableFencer.java migrate premium to oss 2011-01-28 16:07:46 -08:00
UserVmDomRInvestigator.java 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute 2011-09-07 19:18:36 -07:00
VmwareFencer.java add VmwareInvestigator and VmwareFencer, use short worker VM name to avoid vCenter truncation 2011-09-14 15:14:36 -07:00
VmwareInvestigator.java Let VmwareInvestigator return fake but meaningful investigation result 2011-09-14 17:03:39 -07:00
XenServerFencer.java propagate b3aea1878395af343e18382b7f1c376b5be04567 2011-05-10 05:48:29 -07:00
XenServerInvestigator.java 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute 2011-09-07 19:18:36 -07:00