Commit Graph

238 Commits

Author SHA1 Message Date
Edison Su 5fade1ff43 bug 13416: backport patch from master to 2.2.14, need to restart cloud-agent on kvm host if cancelmaitaineance command is send
status 13416: resolved fixed
Reviewed-by: frank
2012-02-02 15:17:35 -08:00
anthony 2d6c426775 bug 12844: fixed merge
reviewed-by : edison
2012-02-01 11:32:35 -08:00
anthony e1aa9c0ead bug 12844: fixed a regression
reviewed-by : edison
2012-01-31 17:09:41 -08:00
anthony c530cbad2a bug 12844, 13394: 1. if connect to host fails, don't need to investigate
2. add ha parameter to dissconnect host to indicate if HA VMs on this host

status 12844, 13394: resolved fixed

reviewed-by : edison
2012-01-31 15:23:07 -08:00
Alena Prokharchyk 57cc61396d Schedule HA is a part of handleDisconnect, not removeAgent
Reviewed-by: Alex Huang
2012-01-31 10:24:21 -08:00
abhi 606708e0a3 bug 12849: remove agent will kickstart HA if the host status is Down or Alert. The update is therefore moved before it
reviewed by: kishan
2012-01-30 18:10:37 +05:30
Edison Su d9287f0e43 tell agent to reconnect to mgt server, if cancelmaintainance cmd is called 2012-01-19 17:14:00 -08:00
Edison Su d910b7f85d bug 12622: start ha with vm investigation when host is disconnected
status 12622: resolved fixed
2012-01-05 14:48:46 -08:00
Edison Su 6ecb0f2b6b bug 12616: 40 hosts connecting to mgt server, need to set workers > 40 in mgt server.
status 12616: resolved fixed
2012-01-04 21:36:40 -08:00
Edison Su 4a7c684526 bug 12616: advanced startup command for direct connected agent
status 12616: resolved fixed
2012-01-03 18:29:40 -08:00
Alena Prokharchyk 4439fd8a51 bug 12790: use processDisconnect() when disconnect the agent during agent LB process
status 12790: resolved fixed
2011-12-29 16:56:46 -08:00
anthony 1ba2d1c8d5 add more logs 2011-11-02 17:04:18 -07:00
Kelven Yang 3aba30543c bug 11624: command via AgentManagerImpl.sendTo() needs to be redirect to HypervisorGuru for command filtering, the filtering mechanism is required by VMware hypervisor to redirect storage/snapshot commands to SSVM 2011-10-17 18:03:54 -07:00
anthony 0bdd6ded96 timeout is not set for some commands 2011-09-29 12:17:08 -07:00
prachi 97bdb58b6d Bug 11404 - VM was in Running state, had null for a pod_id, basically didnt allow creation of subsequent vm's
Reviewed-by: Alex

Changes:
- When management server starts, it goes through all the pending work items from op_it_work table and schedules HA work for each. It used to mark each item as done. Instead we should keep the item as pending and let it get marked as Done after the HA work is done.
- Changes in VirtualMachineMgr::advanceStop() :
a) if we find a VM with null hostId, we stop the VM only if it is forced stopped.
b) if VM state transition to Stopping fails,for state Starting and Migrating we try to find the pending work item and then do cleanup the VM. In case state is Stopping we can cleanup directly.
c) We proceed releasing all resources only if state transitioned to 'Stopping'.
- Changes in HA:
a) Depend on VirtualMachineMgr::advanceStop() in case host is not found to do VM cleanup
- When Vm state between mgmt server and agent syncs from starting -> running, mark any pending work item as done.
2011-09-15 18:47:05 -07:00
anthony a308823549 bug 11413: when mark host ad disconnected, set lastping to now - pingtimeout
status 11413: resolved fixed
2011-09-12 18:46:58 -07:00
anthony a369885a0f 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute
2. added following configurable timeout
       PrimaryStorageDownloadWait("Storage", TemplateManager.class, Integer.class, "primary.storage.download.wait", "10800", "In second, timeout for download template to primary storage", null),
       CreateVolumeFromSnapshotWait("Storage", StorageManager.class, Integer.class, "create.volume.from.snapshot.wait", "10800", "In second, timeout for create template from snapshot", null),
       CopyVolumeWait("Storage", StorageManager.class, Integer.class, "copy.volume.wait", "10800", "In second, timeout for copy volume command", null),
       CreatePrivateTemplateFromVolumeWait("Storage", UserVmManager.class, Integer.class, "create.private.template.from.volume.wait", "10800", "In second, timeout for CreatePrivateTemplateFromVolumeCommand", null),
       CreatePrivateTemplateFromSnapshotWait("Storage", UserVmManager.class, Integer.class, "create.private.template.from.snapshot.wait", "10800", "In second, timeout for CreatePrivateTemplateFromSnapshotCommand", null),
       BackupSnapshotWait("Storage", StorageManager.class, Integer.class, "backup.snapshot.wait", "10800", "In second, timeout for BackupSnapshotCommand", null),
2011-09-07 19:18:36 -07:00
anthony 9842a9aed3 bug 10078:
1. introduce migratewait in global configuration, the default value is 1 hour
 2. use async xapi VM migration API

status 10078: resolved fixed
2011-09-07 12:40:30 -07:00
Kelven Yang a7ac75f920 bug 11304: restore host status after initialization failure 2011-09-02 15:17:57 -07:00
anthony 57e731b60e set timeout for CheckOnHostCommand to 50 s 2011-09-02 15:01:06 -07:00
frank 18f87c2108 Merge branch 'cvm' into 2.2.y
Conflicts:
	api/src/com/cloud/api/BaseCmd.java
	cloud.spec
	core/src/com/cloud/storage/template/DownloadManagerImpl.java
	server/src/com/cloud/agent/manager/AgentManagerImpl.java
	server/src/com/cloud/configuration/DefaultComponentLibrary.java
	server/src/com/cloud/deploy/FirstFitPlanner.java
	server/src/com/cloud/host/dao/HostDao.java
	server/src/com/cloud/network/security/SecurityGroupListener.java
	server/src/com/cloud/storage/StorageManagerImpl.java
	server/src/com/cloud/storage/listener/StoragePoolMonitor.java
	server/src/com/cloud/vm/UserVmManagerImpl.java
	server/src/com/cloud/vm/VirtualMachineManagerImpl.java
	utils/src/com/cloud/utils/SerialVersionUID.java
2011-08-19 16:08:35 -07:00
Murali Reddy 37512883f1 bug 11148: VMs that got stopped during Host Maintenance have host_id associated with them
status 11148: resolved fixed

enabled vm stop, if the host is last valid host in cluster
2011-08-17 18:11:23 +05:30
anthony 5f9884d97a Bug 10197:
1. don't try HA vms if host hypervisor version changes
    2. fixed a bug related to VM full sync with hosttrack enabled
2011-08-02 16:48:27 -07:00
Alex Huang f043f63eaa Merged changes from 2.2.8.zucchini 2011-08-02 15:33:48 -07:00
anthony 7d02ed344e Bug 10197: do not check timeout against cluster which is not managed 2011-08-01 17:00:58 -07:00
Sheng Yang 6c493bfb82 Add exception message for AgentManagerImpl.investigate() 2011-07-27 10:53:06 -07:00
Sheng Yang 3a8e13f968 Add exception message for AgentManagerImpl.investigate() 2011-07-27 10:52:48 -07:00
Alex Huang c610925304 moved agent ping to in memory rather than db based 2011-07-25 15:21:06 -07:00
Alex Huang 10ac7753ed Switched ping to use the same db connection so that running out of db connections won't affect basic operations 2011-07-25 10:36:00 -07:00
Kelven Yang 3a6f3b71e0 bug 10791: add data integrity check upon management server startup 2011-07-21 17:08:29 -07:00
alena c21273d23a bug 10734: removed global lock in "DirectAgentScanTimerTask". This lock used to prevent the task from executing on multiple management server simultaniously.
status 10734: resolved fixed
2011-07-21 16:18:43 -07:00
anthony 3881e13387 bug 10197:
The step to upgrade xenserver,

1. put cluster in Unmanaged state through UI , then MS will not talk to hosts in the cluster
2. upgrade xenserver according to XenServer upgrade guide.
3. put cluster in Managed state through UI, then MS will reconnect hosts

TODO,

1. UI
2. vm pool sync , leveraged from kelven's work
2011-07-19 15:26:25 -07:00
alena c48c3edfbc bug 10271: don't include removed records when search for local storage pool
status 10217: resolved fixed
2011-07-19 11:10:53 -07:00
Alex Huang d54f6d536a propagating transaction isolation fix for merovingian2 2011-07-18 16:48:49 -07:00
alena 7a04334b60 bug 10734: removed global lock in "DirectAgentScanTimerTask". This lock used to prevent the task from executing on multiple management server simultaniously.
status 10734: resolved fixed
2011-07-18 15:00:13 -07:00
Alex Huang e52a97b969 Switched ping to use the same db connection so that running out of db connections won't affect basic operations 2011-07-18 14:22:49 -07:00
anthony 18003deedf bug 10628: root cause is CheckHealthCommand return false, XenServerInvestigator is not called
status 10628: resolved fixed
2011-07-14 20:42:26 -07:00
anthony 468136be74 bug 9855: two fixes.
1. can not cancel maintenace mode.
2. maintenance related modes are preserved through MS restart

status 9855: resolved fixed
2011-06-27 13:48:12 -07:00
alena 41f12eb642 Pass isForRebalance parameter to processConnect method of all the listeners - some listeners don't have to be notified when connection happens as a a part of Agent Rebalance process (VirtualMachineManagerImpl listener for instance) 2011-06-27 10:20:41 -07:00
alena 0bf34f3612 bug 10447: don't notify VirtualMachineManager listener when do host rebalance - vm sync is not needed in this case.
status 10447: resolved fixed
2011-06-27 10:20:40 -07:00
Edison Su 3642aef4c6 bug 10423: agent in ssvm needs to add default keystore, as we copying templates through https://**realhostip.**
status 10423: resolved fixed
2011-06-24 14:45:47 -04:00
Edison Su 28f0068151 add new option to force destroy vm when delete host, if the VMs are created on local storage 2011-06-23 20:36:13 -04:00
anthony 62249f3eae 1. return message to UI if adding primary storage failed
2. delete primary storage entry if if adding primary storage failed
2011-06-22 18:44:33 -07:00
Edison Su ad5162ef86 fix ebtable cleanup issue: on ubuntu, it's not got deleted if vm is stopped 2011-06-16 19:26:24 -04:00
Edison Su 2e8d1bbd6c bug 10190: add log if failed to delete host when host is in UP state 2011-06-15 12:02:31 -04:00
Kelven Yang 24c87c306b merge adding host fix from 2.2.4 2011-06-14 17:16:19 -07:00
Frank 379cbc1d55 Store all parameters of url call to BaseCmd.fullUrlParams so there will be no
changes in future API because all parameters can be retrieve from API command itself
2011-06-08 10:25:15 -07:00
alena 14cdc7de14 bug 9127: covered failure scenarios for agent LB.
status 9127: resolved fixed

The feature is completed; please file separate bugs if any issue arises during the testing.
Wiki link describing how agentLB works: http://intranet.lab.vmops.com/engineering/release-2.2-features/agent-load-balancing
2011-06-05 17:35:30 -07:00
Alex Huang 019cc78976 Fixes problems in routing between management servers 2011-06-05 16:06:54 -07:00
Alex Huang d9e0bcfa1e bug 10126: Renamed getPodId() to getPodIdToDeployIn() 2011-06-03 22:17:08 -07:00