Commit Graph

311 Commits

Author SHA1 Message Date
Edison Su 5fade1ff43 bug 13416: backport patch from master to 2.2.14, need to restart cloud-agent on kvm host if cancelmaitaineance command is send
status 13416: resolved fixed
Reviewed-by: frank
2012-02-02 15:17:35 -08:00
anthony 2d6c426775 bug 12844: fixed merge
reviewed-by : edison
2012-02-01 11:32:35 -08:00
anthony e1aa9c0ead bug 12844: fixed a regression
reviewed-by : edison
2012-01-31 17:09:41 -08:00
anthony c530cbad2a bug 12844, 13394: 1. if connect to host fails, don't need to investigate
2. add ha parameter to dissconnect host to indicate if HA VMs on this host

status 12844, 13394: resolved fixed

reviewed-by : edison
2012-01-31 15:23:07 -08:00
Alena Prokharchyk 57cc61396d Schedule HA is a part of handleDisconnect, not removeAgent
Reviewed-by: Alex Huang
2012-01-31 10:24:21 -08:00
abhi 606708e0a3 bug 12849: remove agent will kickstart HA if the host status is Down or Alert. The update is therefore moved before it
reviewed by: kishan
2012-01-30 18:10:37 +05:30
Edison Su d9287f0e43 tell agent to reconnect to mgt server, if cancelmaintainance cmd is called 2012-01-19 17:14:00 -08:00
Edison Su d910b7f85d bug 12622: start ha with vm investigation when host is disconnected
status 12622: resolved fixed
2012-01-05 14:48:46 -08:00
Edison Su 6ecb0f2b6b bug 12616: 40 hosts connecting to mgt server, need to set workers > 40 in mgt server.
status 12616: resolved fixed
2012-01-04 21:36:40 -08:00
Edison Su 4a7c684526 bug 12616: advanced startup command for direct connected agent
status 12616: resolved fixed
2012-01-03 18:29:40 -08:00
Alena Prokharchyk 4439fd8a51 bug 12790: use processDisconnect() when disconnect the agent during agent LB process
status 12790: resolved fixed
2011-12-29 16:56:46 -08:00
anthony 1ba2d1c8d5 add more logs 2011-11-02 17:04:18 -07:00
Kelven Yang 3aba30543c bug 11624: command via AgentManagerImpl.sendTo() needs to be redirect to HypervisorGuru for command filtering, the filtering mechanism is required by VMware hypervisor to redirect storage/snapshot commands to SSVM 2011-10-17 18:03:54 -07:00
anthony 0bdd6ded96 timeout is not set for some commands 2011-09-29 12:17:08 -07:00
prachi 97bdb58b6d Bug 11404 - VM was in Running state, had null for a pod_id, basically didnt allow creation of subsequent vm's
Reviewed-by: Alex

Changes:
- When management server starts, it goes through all the pending work items from op_it_work table and schedules HA work for each. It used to mark each item as done. Instead we should keep the item as pending and let it get marked as Done after the HA work is done.
- Changes in VirtualMachineMgr::advanceStop() :
a) if we find a VM with null hostId, we stop the VM only if it is forced stopped.
b) if VM state transition to Stopping fails,for state Starting and Migrating we try to find the pending work item and then do cleanup the VM. In case state is Stopping we can cleanup directly.
c) We proceed releasing all resources only if state transitioned to 'Stopping'.
- Changes in HA:
a) Depend on VirtualMachineMgr::advanceStop() in case host is not found to do VM cleanup
- When Vm state between mgmt server and agent syncs from starting -> running, mark any pending work item as done.
2011-09-15 18:47:05 -07:00
alena f4e22094e0 Do agent disconnect when agent rebalance fails
Reviewed-by: Alex Huang
2011-09-15 18:36:22 -07:00
anthony a308823549 bug 11413: when mark host ad disconnected, set lastping to now - pingtimeout
status 11413: resolved fixed
2011-09-12 18:46:58 -07:00
keshav a0ab06b186 Excluded external firewall/LB from host stats collection, and included them in ping checks 2011-09-08 16:39:34 -07:00
anthony a369885a0f 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute
2. added following configurable timeout
       PrimaryStorageDownloadWait("Storage", TemplateManager.class, Integer.class, "primary.storage.download.wait", "10800", "In second, timeout for download template to primary storage", null),
       CreateVolumeFromSnapshotWait("Storage", StorageManager.class, Integer.class, "create.volume.from.snapshot.wait", "10800", "In second, timeout for create template from snapshot", null),
       CopyVolumeWait("Storage", StorageManager.class, Integer.class, "copy.volume.wait", "10800", "In second, timeout for copy volume command", null),
       CreatePrivateTemplateFromVolumeWait("Storage", UserVmManager.class, Integer.class, "create.private.template.from.volume.wait", "10800", "In second, timeout for CreatePrivateTemplateFromVolumeCommand", null),
       CreatePrivateTemplateFromSnapshotWait("Storage", UserVmManager.class, Integer.class, "create.private.template.from.snapshot.wait", "10800", "In second, timeout for CreatePrivateTemplateFromSnapshotCommand", null),
       BackupSnapshotWait("Storage", StorageManager.class, Integer.class, "backup.snapshot.wait", "10800", "In second, timeout for BackupSnapshotCommand", null),
2011-09-07 19:18:36 -07:00
alena 668276e22c bug 11326: don't try to transfer the agent if it's a forward agent
status 11326: resolved fixed

Also added more logging to the agent rebalance code.
2011-09-07 12:49:02 -07:00
anthony 9842a9aed3 bug 10078:
1. introduce migratewait in global configuration, the default value is 1 hour
 2. use async xapi VM migration API

status 10078: resolved fixed
2011-09-07 12:40:30 -07:00
Kelven Yang a7ac75f920 bug 11304: restore host status after initialization failure 2011-09-02 15:17:57 -07:00
anthony 57e731b60e set timeout for CheckOnHostCommand to 50 s 2011-09-02 15:01:06 -07:00
prachi 089b23f7a6 Bug 9921 - template tags
Changes:
- CreateTemplate and RegisterTemplate now support adding a template tag. It is a string value. This is root-admin only action - only admin can add template tags.
- ListTemplates will return the template tag in response.
- HostAllocator changed to use template tag along with the existing tag on service offering. If both tags are present, allocator now finds hosts satisfying both tags. If no hosts have both tags, allocation will fail.
- DB changes to add new column to vm_template table.
- DB upgrade changes for upgrade from 2.2.10 to 2.2.11

Conflicts:

	server/src/com/cloud/api/ApiResponseHelper.java
	server/src/com/cloud/template/TemplateAdapterBase.java
	server/src/com/cloud/vm/UserVmManagerImpl.java
2011-08-25 15:18:18 -07:00
frank 18f87c2108 Merge branch 'cvm' into 2.2.y
Conflicts:
	api/src/com/cloud/api/BaseCmd.java
	cloud.spec
	core/src/com/cloud/storage/template/DownloadManagerImpl.java
	server/src/com/cloud/agent/manager/AgentManagerImpl.java
	server/src/com/cloud/configuration/DefaultComponentLibrary.java
	server/src/com/cloud/deploy/FirstFitPlanner.java
	server/src/com/cloud/host/dao/HostDao.java
	server/src/com/cloud/network/security/SecurityGroupListener.java
	server/src/com/cloud/storage/StorageManagerImpl.java
	server/src/com/cloud/storage/listener/StoragePoolMonitor.java
	server/src/com/cloud/vm/UserVmManagerImpl.java
	server/src/com/cloud/vm/VirtualMachineManagerImpl.java
	utils/src/com/cloud/utils/SerialVersionUID.java
2011-08-19 16:08:35 -07:00
alena 6291554576 bug 11154: host can go in Maintenance state only after all vms are migrated from it
status 11154: resolved fixed
2011-08-17 12:03:53 -07:00
Murali Reddy 37512883f1 bug 11148: VMs that got stopped during Host Maintenance have host_id associated with them
status 11148: resolved fixed

enabled vm stop, if the host is last valid host in cluster
2011-08-17 18:11:23 +05:30
alena 3945eec0df Fixed the bug in allocator where cluster was added to avoid set as pod 2011-08-15 10:43:59 -07:00
anthony 5f9884d97a Bug 10197:
1. don't try HA vms if host hypervisor version changes
    2. fixed a bug related to VM full sync with hosttrack enabled
2011-08-02 16:48:27 -07:00
Alex Huang f043f63eaa Merged changes from 2.2.8.zucchini 2011-08-02 15:33:48 -07:00
anthony 7d02ed344e Bug 10197: do not check timeout against cluster which is not managed 2011-08-01 17:00:58 -07:00
frank b0b3f16dae Two things:
Load non-routing resource in ClusteredAgentManager includes External DHCP, PxeServer, ExternalFirewall, ExternalLoadBalancer

Bug 9887 - baremetal: support for image operation (create template from guest disk) (edit)

changes in line with UI
2011-07-29 11:28:09 -07:00
Alex Huang 6fea146903 more index. moved op_lock to memory table to try it 2011-07-27 14:06:40 -07:00
Sheng Yang 6c493bfb82 Add exception message for AgentManagerImpl.investigate() 2011-07-27 10:53:06 -07:00
Sheng Yang 3a8e13f968 Add exception message for AgentManagerImpl.investigate() 2011-07-27 10:52:48 -07:00
Alex Huang 9c627a15f3 Inaccurate clock new gets an mbean to control it 2011-07-25 16:01:31 -07:00
Alex Huang c610925304 moved agent ping to in memory rather than db based 2011-07-25 15:21:06 -07:00
Alex Huang 1b56808be5 brought over agent ping uses the same db connection 2011-07-25 10:57:00 -07:00
Alex Huang 10ac7753ed Switched ping to use the same db connection so that running out of db connections won't affect basic operations 2011-07-25 10:36:00 -07:00
Alex Huang 3f18192df8 Make all connections READ COMMITTED isolation level instead of setting it everytime we get the db connection causing useless round trips 2011-07-23 14:58:32 -07:00
Alex Huang 12cd5db620 deleted a file by mistake 2011-07-22 11:39:16 -07:00
Alex Huang b59c6b4ab6 propagate lock table fix 2011-07-22 11:35:47 -07:00
Alex Huang 44ce9488a6 propagate lock table fixes 2011-07-22 11:30:23 -07:00
Kelven Yang 3a6f3b71e0 bug 10791: add data integrity check upon management server startup 2011-07-21 17:08:29 -07:00
alena c21273d23a bug 10734: removed global lock in "DirectAgentScanTimerTask". This lock used to prevent the task from executing on multiple management server simultaniously.
status 10734: resolved fixed
2011-07-21 16:18:43 -07:00
alena ee98887176 2 fixes for Agent Load Balancer:
* when management server dies and notifies other management servers about this, the running management server has to cleanup host_transfer records belonging to the died management server
* issue agent load balancing task only when agent load (number of connected agents in the system) exceeds "agent.load.threshold" - 70% by default

Conflicts:

	server/src/com/cloud/configuration/Config.java
	server/src/com/cloud/host/dao/HostDaoImpl.java
	setup/db/db/schema-228to229.sql
2011-07-21 15:28:11 -07:00
alena 307741edcd 2 fixes for Agent Load Balancer:
* when management server dies and notifies other management servers about this, the running management server has to cleanup host_transfer records belonging to the died management server
* issue agent load balancing task only when agent load (number of connected agents in the system) exceeds "agent.load.threshold" - 70% by default

Conflicts:

	server/src/com/cloud/configuration/Config.java
	setup/db/db/schema-228to229.sql
2011-07-21 15:27:50 -07:00
anthony 3881e13387 bug 10197:
The step to upgrade xenserver,

1. put cluster in Unmanaged state through UI , then MS will not talk to hosts in the cluster
2. upgrade xenserver according to XenServer upgrade guide.
3. put cluster in Managed state through UI, then MS will reconnect hosts

TODO,

1. UI
2. vm pool sync , leveraged from kelven's work
2011-07-19 15:26:25 -07:00
alena c48c3edfbc bug 10271: don't include removed records when search for local storage pool
status 10217: resolved fixed
2011-07-19 11:10:53 -07:00
Alex Huang d54f6d536a propagating transaction isolation fix for merovingian2 2011-07-18 16:48:49 -07:00