Commit Graph

88 Commits

Author SHA1 Message Date
frank d1084bb383 fix unable to find built-in template 2011-10-27 19:19:19 -07:00
frank 24b82a7a89 Bug 11522 - New agent manager
call SearchCriteriaService interface instead of SearchCriteria2 instance
2011-10-06 10:32:07 -07:00
frank 89e04458b6 Bug 11522 - New agent manager
move all listxxx interface from HostDao to managers(ResourceManager, SecondaryStorageVmManager etc) with decent name using SearchCriteria2
or direct call SearchCriteria2 on demand
2011-10-04 14:35:26 -07:00
frank e8c3ff653d Bug 11522 - New agent manager
move maintanenceFailed to ResourceManager
2011-09-23 16:54:28 -07:00
alena a1331d1cfc Intermidiate checkin to Project feature:
1) Introduce new managers - ProjectManager and DomainManager. Moved all domain related code from AccountManager to DomainManager.
2) Moved some code from ManagementServerImpl to the correct managers.
3) New resource limit for Domain - Project
2011-09-20 18:35:28 -07:00
prachi 0eea1cb733 Bug 11404 - VM was in Running state, had null for a pod_id, basically didnt allow creation of subsequent vm's
Reviewed-by: Alex

Changes:
- When management server starts, it goes through all the pending work items from op_it_work table and schedules HA work for each. It used to mark each item as done. Instead we should keep the item as pending and let it get marked as Done after the HA work is done.
- Changes in VirtualMachineMgr::advanceStop() :
a) if we find a VM with null hostId, we stop the VM only if it is forced stopped.
b) if VM state transition to Stopping fails,for state Starting and Migrating we try to find the pending work item and then do cleanup the VM. In case state is Stopping we can cleanup directly.
c) We proceed releasing all resources only if state transitioned to 'Stopping'.
- Changes in HA:
a) Depend on VirtualMachineMgr::advanceStop() in case host is not found to do VM cleanup
- When Vm state between mgmt server and agent syncs from starting -> running, mark any pending work item as done.

Conflicts:

	server/src/com/cloud/vm/VirtualMachineManagerImpl.java
2011-09-15 19:06:19 -07:00
Kelven Yang 1b9552ea74 Let VmwareInvestigator return fake but meaningful investigation result 2011-09-14 17:04:28 -07:00
Kelven Yang 7a64d8fda4 add VmwareInvestigator and VmwareFencer, use short worker VM name to avoid vCenter truncation 2011-09-14 15:15:26 -07:00
anthony 4423da06a2 1. added timeout in Command Class, then each command can configure itself timeout, if timeout is not configed, use the default timeout , which is 30 minute
2. added following configurable timeout
       PrimaryStorageDownloadWait("Storage", TemplateManager.class, Integer.class, "primary.storage.download.wait", "10800", "In second, timeout for download template to primary storage", null),
       CreateVolumeFromSnapshotWait("Storage", StorageManager.class, Integer.class, "create.volume.from.snapshot.wait", "10800", "In second, timeout for create template from snapshot", null),
       CopyVolumeWait("Storage", StorageManager.class, Integer.class, "copy.volume.wait", "10800", "In second, timeout for copy volume command", null),
       CreatePrivateTemplateFromVolumeWait("Storage", UserVmManager.class, Integer.class, "create.private.template.from.volume.wait", "10800", "In second, timeout for CreatePrivateTemplateFromVolumeCommand", null),
       CreatePrivateTemplateFromSnapshotWait("Storage", UserVmManager.class, Integer.class, "create.private.template.from.snapshot.wait", "10800", "In second, timeout for CreatePrivateTemplateFromSnapshotCommand", null),
       BackupSnapshotWait("Storage", StorageManager.class, Integer.class, "backup.snapshot.wait", "10800", "In second, timeout for BackupSnapshotCommand", null),
2011-09-07 19:17:54 -07:00
anthony c683fda236 set timeout for CheckOnHostCommand to 50 s 2011-09-02 15:01:32 -07:00
anthony c7fb330914 put getConnection into try 2011-09-02 15:01:28 -07:00
frank b3478c377e Full opensource 2011-08-23 19:52:19 -07:00
alena 8a7feb8ec1 Merge branch '2.2.y'
Conflicts:
	agent/src/com/cloud/agent/resource/computing/LibvirtComputingResource.java
	api/src/com/cloud/agent/api/routing/LoadBalancerConfigCommand.java
	api/src/com/cloud/agent/api/to/FirewallRuleTO.java
	api/src/com/cloud/agent/api/to/IpAddressTO.java
	api/src/com/cloud/agent/api/to/PortForwardingRuleTO.java
	api/src/com/cloud/api/ApiConstants.java
	api/src/com/cloud/api/BaseCmd.java
	api/src/com/cloud/api/ResponseGenerator.java
	api/src/com/cloud/api/commands/CreateFirewallRuleCmd.java
	api/src/com/cloud/api/commands/CreateIpForwardingRuleCmd.java
	api/src/com/cloud/api/commands/CreateLoadBalancerRuleCmd.java
	api/src/com/cloud/api/commands/CreatePortForwardingRuleCmd.java
	api/src/com/cloud/api/commands/DeleteLoadBalancerRuleCmd.java
	api/src/com/cloud/api/commands/ListCapabilitiesCmd.java
	api/src/com/cloud/api/commands/UpdateNetworkCmd.java
	api/src/com/cloud/api/response/CapabilitiesResponse.java
	api/src/com/cloud/network/Network.java
	api/src/com/cloud/network/NetworkService.java
	api/src/com/cloud/network/firewall/FirewallService.java
	api/src/com/cloud/network/lb/LoadBalancingRule.java
	api/src/com/cloud/network/lb/LoadBalancingRulesService.java
	api/src/com/cloud/network/rules/FirewallRule.java
	api/src/com/cloud/network/rules/RulesService.java
	api/src/com/cloud/offering/NetworkOffering.java
	client/tomcatconf/commands.properties.in
	cloud.spec
	core/src/com/cloud/agent/resource/virtualnetwork/VirtualRoutingResource.java
	core/src/com/cloud/hypervisor/xen/resource/CitrixHelper.java
	core/src/com/cloud/hypervisor/xen/resource/CitrixResourceBase.java
	core/src/com/cloud/storage/template/DownloadManagerImpl.java
	core/src/com/cloud/vm/DomainRouterVO.java
	debian/cloud-deps.install
	patches/systemvm/debian/config/etc/init.d/cloud-early-config
	patches/systemvm/debian/config/root/ipassoc.sh
	patches/systemvm/debian/config/root/loadbalancer.sh
	scripts/vm/hypervisor/kvm/rundomrpre.sh
	scripts/vm/hypervisor/xenserver/vmops
	server/src/com/cloud/agent/manager/AgentAttache.java
	server/src/com/cloud/agent/manager/AgentManagerImpl.java
	server/src/com/cloud/agent/manager/AgentMonitor.java
	server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
	server/src/com/cloud/alert/ClusterAlertAdapter.java
	server/src/com/cloud/api/ApiResponseHelper.java
	server/src/com/cloud/api/ApiServer.java
	server/src/com/cloud/cluster/ClusterManagerImpl.java
	server/src/com/cloud/configuration/Config.java
	server/src/com/cloud/configuration/ConfigurationManager.java
	server/src/com/cloud/configuration/ConfigurationManagerImpl.java
	server/src/com/cloud/configuration/DefaultComponentLibrary.java
	server/src/com/cloud/deploy/FirstFitPlanner.java
	server/src/com/cloud/ha/HighAvailabilityManagerImpl.java
	server/src/com/cloud/host/dao/HostDaoImpl.java
	server/src/com/cloud/hypervisor/xen/discoverer/XcpServerDiscoverer.java
	server/src/com/cloud/network/LoadBalancerVO.java
	server/src/com/cloud/network/NetworkManager.java
	server/src/com/cloud/network/NetworkManagerImpl.java
	server/src/com/cloud/network/dao/FirewallRulesDao.java
	server/src/com/cloud/network/dao/FirewallRulesDaoImpl.java
	server/src/com/cloud/network/element/DhcpElement.java
	server/src/com/cloud/network/element/VirtualRouterElement.java
	server/src/com/cloud/network/firewall/FirewallManagerImpl.java
	server/src/com/cloud/network/lb/LoadBalancingRulesManagerImpl.java
	server/src/com/cloud/network/router/VirtualNetworkApplianceManager.java
	server/src/com/cloud/network/router/VirtualNetworkApplianceManagerImpl.java
	server/src/com/cloud/network/rules/FirewallManager.java
	server/src/com/cloud/network/rules/FirewallRuleVO.java
	server/src/com/cloud/network/rules/PortForwardingRuleVO.java
	server/src/com/cloud/network/rules/RulesManagerImpl.java
	server/src/com/cloud/network/rules/StaticNatRuleImpl.java
	server/src/com/cloud/network/security/SecurityGroupListener.java
	server/src/com/cloud/network/security/SecurityGroupManagerImpl.java
	server/src/com/cloud/offerings/NetworkOfferingVO.java
	server/src/com/cloud/server/ConfigurationServerImpl.java
	server/src/com/cloud/server/ManagementServerImpl.java
	server/src/com/cloud/storage/StorageManager.java
	server/src/com/cloud/storage/StorageManagerImpl.java
	server/src/com/cloud/storage/dao/VMTemplateHostDaoImpl.java
	server/src/com/cloud/storage/download/DownloadMonitorImpl.java
	server/src/com/cloud/upgrade/DatabaseUpgradeChecker.java
	server/src/com/cloud/upgrade/dao/Upgrade228to229.java
	server/src/com/cloud/upgrade/dao/Upgrade229to2210.java
	server/src/com/cloud/user/AccountManagerImpl.java
	server/src/com/cloud/vm/UserVmManagerImpl.java
	server/src/com/cloud/vm/VirtualMachineManagerImpl.java
	server/src/com/cloud/vm/dao/DomainRouterDao.java
	server/src/com/cloud/vm/dao/DomainRouterDaoImpl.java
	setup/db/create-index-fk.sql
	setup/db/create-schema.sql
	setup/db/db/schema-222to224.sql
	setup/db/db/schema-227to228.sql
	setup/db/db/schema-228to229.sql
	setup/db/db/schema-229to2210.sql
	tools/testClient/README
	ui/scripts/cloud.core.instance.js
	utils/src/com/cloud/utils/SerialVersionUID.java
	utils/src/com/cloud/utils/db/ConnectionConcierge.java
	utils/src/com/cloud/utils/db/Merovingian2.java
	utils/src/com/cloud/utils/db/Transaction.java
	utils/src/com/cloud/utils/nio/Link.java
	utils/src/com/cloud/utils/nio/NioConnection.java
	utils/src/com/cloud/utils/time/InaccurateClock.java
2011-08-22 20:28:30 -07:00
Kelven Yang fdedbbc00e bug 10834: when VMware host is down, don't try to restat VMs on other host. VMware prohibits VM relocation when host is down 2011-08-17 18:11:55 -07:00
Kelven Yang 97e95fce82 bug 10834: when VMware host is down, don't try to restat VMs on other host. VMware prohibits VM relocation when host is down 2011-08-17 18:09:31 -07:00
Kelven Yang 5fc66d1c61 re-enable HA logic on VM state synchronization for VMware 2011-08-15 17:00:43 -07:00
Kelven Yang e59c14a758 Disable HA in CloudStack HA manager under VMware 2011-08-11 18:10:07 -07:00
Kelven Yang c498cf3ec6 Disable HA in CloudStack HA manager under VMware 2011-08-11 18:09:11 -07:00
Sheng Yang 4bc8686513 bug 10429: Backport redundant virtual router
Part 1

This backport contained:

commit 52317c718c25111c2535657139b541db0c9d1e1f
    bug 9154: Initial check in for enabling redundant virtual router

commit 54199112055d754371bfb141168fb5538bf6d6ea
    Add host verification for CheckRouterCommand

commit cef978a228c90056ead9be10cbc4de74c2b8de76
    Fix CheckRouterAnswer's isMaster report

commit 4072f0a6991ac3b63601a1764fbe14188965f62f
    Some build fixes and code refactoring for redundant router

commit 4d3350b7cd8ee2706a9bace4437fc194e36c8dd5
    Redundant Router: Fix OVS

commit 6a228830e7c46d819fa0c3317e159e041337e887
    Fix findByNetwork()/findByNetworkAndPod()'s return

commit c627777b3d5bdbcd60db4032cebd349a5b1ecd83
    Redundant Router: Fix isVmAlive()

commit e1275d2514adc41f8744f5107d4069c38be195f1
    Only issue CheckRouterCommand to redundant routers

And all modification to the scripts till
commit 4e3942462ed3fde3a3d7011e95839e2128fba514
logging changes

in the master branch.
2011-07-18 18:29:56 -07:00
anthony 0c53bddb16 bug 10628: root cause is CheckHealthCommand return false, XenServerInvestigator is not called
status 10628: resolved fixed
2011-07-15 10:12:54 -07:00
anthony 18003deedf bug 10628: root cause is CheckHealthCommand return false, XenServerInvestigator is not called
status 10628: resolved fixed
2011-07-14 20:42:26 -07:00
Alex Huang 852cf0e6c7 fix migration npe when recovering 2011-07-09 08:32:10 -07:00
Alex Huang db5afa4994 fix migration npe when recovering 2011-07-09 08:31:44 -07:00
anthony b885915f1e bug 10628: if private network and storage network use the same nic, MS will start HA very quickly within 20 seconds, it breaks heartbeat check, which require 60 seconds interval. add 60s sleep before trying to HA on VMs
status 10628: resolved fixed
2011-07-07 12:39:46 -07:00
anthony 931dcff710 bug 10628: if private network and storage network use the same nic, MS will start HA very quickly within 20 seconds, it breaks heartbeat check, which require 60 seconds interval. add 60s sleep before trying to HA on VMs
status 10628: resolved fixed
2011-07-07 12:36:39 -07:00
Alex Huang d39048faca bug 10260: propagate ha and deployment planner fixes 2011-06-13 17:33:09 -07:00
Alex Huang 1d7e70acd1 bug 10260: propagate ha and deployment planner fixes 2011-06-13 17:35:20 -07:00
Alex Huang d01e20c443 bug 10094: The problem was we added code that won't add any more ha work items if it already has one. However, that is wrong. HA Manager stores the existing snapshot of the VM state machine. Before working on HA for a VM, it checks to see if that snapshot has been changed. So by not scheduling HA work, we've effectively made HA not work under multi-failure situations. I've fixed by removing that code and instead at the time of performing HA, do a quick check to see if there are pwork underway for the same VM and work scheduled in the future for that VM. If there are work scheduled in the future, then we simply cancel the current work. If there are already work underway, then we retry again in 1 minute. 2011-06-12 09:25:48 -07:00
Alex Huang 6137f216b1 bug 10094: The problem was we added code that won't add any more ha work items if it already has one. However, that is wrong. HA Manager stores the existing snapshot of the VM state machine. Before working on HA for a VM, it checks to see if that snapshot has been changed. So by not scheduling HA work, we've effectively made HA not work under multi-failure situations. I've fixed by removing that code and instead at the time of performing HA, do a quick check to see if there are pwork underway for the same VM and work scheduled in the future for that VM. If there are work scheduled in the future, then we simply cancel the current work. If there are already work underway, then we retry again in 1 minute. 2011-06-12 09:18:21 -07:00
Sheng Yang 7e2fe6b6c9 Redundant Router: Fix isVmAlive() 2011-06-09 15:41:12 -07:00
Sheng Yang 62ac899091 bug 9154: Initial check in for enabling redundant virtual router
This patch enable redundant virtual routers.

1. To enable this feature, db need to be updated using follow SQL by now(we
would get a UI way later):

UPDATE network_offerings SET redundant_router=1 WHERE guest_type="Virtual" AND
system_only=0;

2. System would try to start up two routers at different hosts. But if there is
only one host in the zone, system would start up two routers on it.

3. The failover part is using keepalived, and connection tracking part is using
conntrackd. There would be one master router and one backup router. The status
of router(master or backup) can be query from the database table domain_router
now. Management server would update the status every 30s by default.

4. The routers for the same zone would use same external NIC(same ip and mac).
The script used for fail-over would ensure only one external NIC present in the
network at any time.

5. Currently management server don't got the ability to stop one of router is
both of them reported as master. The feature is in the todo list.

After two routers start up, disconnect anyone of them, the guest network
shouldn't be affected, and established connection(http, ssh, etc.) should still
works. The fail-over on gateway part should be 3~4 seconds.

Currently the patch works with KVM. Would deal with vmware and XenServer soon.
2011-06-07 14:47:45 -07:00
Alex Huang d9e0bcfa1e bug 10126: Renamed getPodId() to getPodIdToDeployIn() 2011-06-03 22:17:08 -07:00
Alex Huang 154c6d9021 Propagating 1345af2a0e84684a804bde5b281c30df72f148a0 2011-05-10 05:52:39 -07:00
Alex Huang efedf018c8 propagate b3aea1878395af343e18382b7f1c376b5be04567 2011-05-10 05:48:29 -07:00
Alex Huang 6ce656220f bug 9643: propagate fix from 2.2.4 2011-04-29 17:51:42 -07:00
Alex Huang 99bc15f64a changed getName to getHostname 2011-04-29 08:34:10 -07:00
prachi c2824edc03 Bug 9446: Investigator reports that a system vm is down even if it isn't....
Changes:
- Added new Investigator 'ManagementIPSystemVMInvestigator' that checks if Vm is alive only for System VM's that have a management IP address.
- If no management IP is found, ping test cannot be done, so this investigator would return null in that case.
- Current implementation InvestigatorImpl is renamed as 'UserVmDomRInvestigator' and does the ping test for user VMs only.
- Corrected the ping test code that was checking a hard-coded string. Now if the ping answer is negative, we just return null
- Added the new investigator to components.xml
2011-04-28 12:28:51 -07:00
alena 7255d68875 HA: no need to investigate why vm was stopped on host when host is being Dicsonnected with investigate=false option 2011-04-22 13:38:25 -07:00
alena f881d394e2 bug 9415: deleteHost - cleanup vms running on the host to be removed
status 9415: resolved fixed
2011-04-20 15:50:10 -07:00
Edison Su d6b5acb852 bug 8532,8755: don't create multiple HA work if there already has one of HAwork of this VM is created, but not finished
status 8532: resolved fixed
status 8755: resolved fixed
2011-04-14 17:46:54 -04:00
Frank 92155522f2 Add license header to files 2011-04-14 11:23:14 -07:00
Kelven Yang 1b9cbd9166 bug 9223, 9224: persist runid to form cluster session, based on cluster session and DB condition to issue isolation notification for self-fencing 2011-04-13 15:13:54 -07:00
Alex Huang 075fba5899 stackmaid is now taskmanager 2011-04-05 10:17:22 -07:00
prachi 53f8ebf6f0 Bug 9043 - VM manual migration - when destination host is out of memory for migration, VMs being migrated remained in 'migrating' state
Changes:
- When migration fails we try to do cleanup on the destination host agent. The AgentUnavailableException in this cleanup was not caught.
-Due to that other cleanup like reverting capacity allocated and vm state were skipped.
-Fix is to catch the AgentUnavailableException so that rest of the cleanup can happen.
- Also corrected the exceptions in various cases of migration failure.
- In case the VM is still starting, HA should schedule a retry. Introduced a special migration exception for handling this.
2011-04-04 17:30:08 -07:00
alena 745aa1d66a bug 8448: generate Alert when vm is scheduled for HA
status 8448: resolved fixed
2011-03-24 17:37:18 -07:00
Alex Huang 6b0d4947ed bug 8529: propagated to master. Added junit test support to ant 2011-02-16 17:40:58 -08:00
anthony 80a328034c bug 8609: when failed to start a VM in HA (due to domr is not migrated), a runtimeException is thrown out, caused HA for this VM is not resheduced.
status 8609: resolved fixed
2011-02-16 14:20:28 -08:00
Alex Huang db7bc893b9 added cluster awareness to vm start/stop 2011-02-11 17:03:04 -08:00
Alex Huang c22d4948d0 Added context to ha work 2011-02-08 15:38:26 -08:00
Alex Huang b322fb072f bug 8186: Changed the investigator to use the new networking 2011-02-07 16:04:23 -08:00