Commit Graph

326 Commits

Author SHA1 Message Date
Koushik Das ab60e9eae9 CLOUDSTACK-4636: In a scaled up setup all Vm's in a cluster were stopped and/or started after management server restart
Issue happens as there are more than one thread processing connect for a host simultaneously. The VM full sync. is not designed to work in this scenario and as a result user VMs may get stopped incorrectly.
Direct agent scan task runs at regular intervals (direct.agent.scan.interval defaulted to 90 secs) and identifies hosts that needs to be processed for connect. In a normal scenario hosts mostly get connected within that interval and there are no issues. But if due to some reason the connect process takes more time and is not completed by the time next agent scan runs. In this case, based on the db. state same hosts may get picked up again. And then there will be situations where more than one thread is processing connect for the same host.
The fix is to check if there is a thread already processing connect for a host and in this case all subsequent threads for that host will simply bail out. Also there may be a scenario where one thread already completed processing connect but another thread already got scheduled before that and will again repeat the same. This is also prevented by putting appropriate checks.
2013-09-10 11:54:32 +05:30
Edison Su 6766c01fc6 CLOUDSTACK-3535: still maintain stop agent wont trigger HA 2013-08-07 14:42:04 -07:00
Edison Su 33d06d25b8 CLOUDSTACK-3535: add kvminvestigator to investirage kvm host 2013-08-06 18:50:33 -07:00
Min Chen 9f9f7d3ffd Remove CLOUDSTACK-3513 debugging messages. 2013-07-22 16:15:45 -07:00
Min Chen 251efff69c CLOUDSTACK-3513: add debug message to diagnose copyIso issue on
automation setup where DownloadCommand is never sent.
2013-07-18 16:01:55 -07:00
Harikrishna Patnala ec2bf09284 CLOUDSTACK-2794: Global parameter "router.template.id" should be removed The parameter was not in use. We use zone/global coonfigutaion parameters router.template.xenserver/vmware/hyperv/kvm/lxc to deploy router
Signed-off-by: Abhinandan Prateek <aprateek@apache.org>
2013-06-26 15:56:40 +05:30
Min Chen 5b76e4914c Remove sendToSecStorage methods from agentManager to use EndPoint
instead.
2013-04-24 16:21:41 -07:00
Min Chen 9c584b5500 Use EndPoint to send local/remote command, and hide agentMgr message
passing.
2013-04-22 13:21:28 -07:00
Min Chen 1b3994e180 Fix copyTemplateCmd. 2013-04-16 16:38:14 -07:00
Hugo Trippaers f1259d50bd Fix for _pingTimeout being 0 in AgentMonitor
With commit d79f1f6fdc the AgentMonitor
was replaced with a pluggable service. However the ping timeout in the
original constructor was not passed on anymore, leading to a default
pingTimout of 0. This would fail all agents constantly.

Modified the startMonitor command to take a pingtimeout as an argument
and instruct AgentManagerImpl to pass it along.
2013-04-04 14:23:42 +02:00
Edison Su 409ec9c6b6 CLOUDSTACK-1426: We has strong implication that VO must implement an interface, otherwise EntityManagerImpl can't the vo 2013-03-07 18:25:57 -08:00
Kelven Yang 333dd810d2 CLOUDSTACK-1339: Using Sping interface injection pattern to avoid using CGLIB proxying mode. Spring with CGLIB proxying will concflict with CGLIB usage in CloudStack DB code, CloudStack CGLIB usage can cause Spring to lose tack of its proxied object and therefore creates a massive amount of objects in memory 2013-03-05 19:03:30 -05:00
Min Chen bd4661e467 CLOUDSTACK-1137: force reconnect to a disconnected host throws error. 2013-02-14 17:57:41 -08:00
Kelven Yang 176523254e Improve component lifecycle management with system run-level concept 2013-01-30 15:21:02 -08:00
Kelven Yang 2c5859dbd4 Bring javelin back to the status of being able to start System VMs after another round of master branch merge 2013-01-18 19:15:32 -08:00
Alex Huang 10d9c019a9 All merge conflicts resolved 2013-01-18 12:14:57 -08:00
Koushik Das e45a9f3aed CLOUDSTACK-803: HA gets triggered even when the host investigator is unable to determine the state of the host HA won't be triggered in case the host investigator is not able to determine the state
Signed-off-by: Koushik Das <koushik.das@citrix.com>
Signed-off-by: Abhinandan Prateek <aprateek@apache.org>
2013-01-18 17:20:51 +05:30
Koushik Das cd37e22f9b CLOUDSTACK-810: Make DirectAgent thread pool size configurable Removed hard-coding of directagent thread pool size and now reading it from configuration
Signed-off-by: Chiradeep Vittal <chiradeep@apache.org>
2013-01-17 17:21:52 -08:00
Alex Huang 56e5fbdee2 removed import of componentlocator and inject from all files 2013-01-10 11:44:47 -08:00
Alex Huang f40e7b7511 removed componentlocator and inject 2013-01-10 11:05:20 -08:00
Alex Huang 0bcb64605f all built with the latest 2013-01-09 05:02:39 -08:00
Kelven Yang b274c570f9 Cleanup places that use explicit wiring of the components 2013-01-08 17:45:33 -08:00
Alex Huang 7f3a748d6c Merge branch 'javelin' of https://git-wip-us.apache.org/repos/asf/incubator-cloudstack into javelin 2013-01-08 14:46:38 -08:00
Kelven Yang 32e67f60d4 Work with Spring proxy-ed object 2013-01-08 14:24:19 -08:00
Alex Huang 30f2565d98 Merge branch 'api_refactoring' into javelin 2013-01-08 12:36:04 -08:00
Kelven Yang d79f1f6fdc Replace Adapters and PluggableServices, use Spring to load them 2012-11-07 15:03:24 -08:00
Kelven Yang cea8f3bf37 Switch inject annotation to javax and let ComponentLocator to recognize both the new and original inject annotation 2012-11-07 15:03:22 -08:00
Kelven Yang aab02e2743 Add Spring annotation to major components 2012-11-07 14:53:39 -08:00
Edison Xu b101dc7279 KVM agent connet:
* send StartupAnswer right after StartupCommand is recieved
* if post processor going wrong, send out readycommand with error message to agent, then agent will exit
2012-11-05 10:00:16 -08:00
Alena Prokharchyk a5077968db CS-16592: process handleConnectedAgent in a separate thread pool 2012-11-02 10:47:14 -07:00
Alena Prokharchyk 62607c9a75 HandleDisconnect - don't update the DB when the disconnect event is happening as a part of MS Cluster notification
Reviewed-by: Frank Zhang
2012-11-02 09:59:37 -07:00
Edison Su 9a9c96df64 Patch fixes file names and imports wherever used, in files introduced in
73be77a4c1
I've renamed discover to discoverer to fix the issue. My ant debug fails
with:
     [java] ERROR [utils.component.ComponentLocator] (main:) Unable to
load configuration for management-server from components.xml
     [java] com.cloud.utils.exception.CloudRuntimeException: Unable to
find class: com.cloud.hypervisor.kvm.discoverer.KvmServerDiscoverer

RB: https://reviews.apache.org/r/6239/
Send-by: rohit.yadav@citrix.com
2012-07-31 10:38:11 -07:00
Edison Su 7a0a9231c3 Move KVM related code into plugins/hypervisor/kvm, a new jar file is
created: cloud-kvm.jar
2012-07-30 14:55:47 -07:00
Chip Childers 8f71a2927f License header updates for the server folder. 2012-07-02 08:58:10 -04:00
frank 2f634c0913 Switch to Apache license 2012-04-03 04:50:05 -07:00
Edison Su 289a641d4f bug 13789: don't shutdown host if it's a forward agent
status 13789: resolved fixed
Reviewed-by: frank
2012-02-17 13:04:03 -08:00
prachi 63fd5d1f64 Bug 13703 - [External Service Providers] Unable to find a Discoverer to load the resource: 1 for hypervisor type : null
Changes:
- in case of external service providers, there is no discoverer that could load the resource.
- So we have to rely on agentMgr to load the resource as earlier.
2012-02-14 12:17:25 -08:00
prachi dbe2305352 Bug 13099 table physical-network-traffic-types needs to be updated when xen network device setting is modified in the global settings
Changes:
- We do not need these global setting anymore. These will be hidden since 3.0
- The default traffic label will be picked from the global setting which is null by default. When traffic label is null it means the resource uses tag on the default gateway
- Changes to invoke discoverer to reload the resource object on host connection
- Since a zone can have many physical networks, there can be multiple guest, public networks. Only the zone wide storage and management traffic label will be stored in host_details henceforth.
- If traffic labels are updated, discoverer should update the host_details
2012-02-07 18:41:23 -08:00
anthony 5c0b585aa0 bug 12844: fixed a regression
reviewed-by : edison

Conflicts:

	server/src/com/cloud/agent/manager/AgentManagerImpl.java
2012-01-31 17:12:49 -08:00
anthony cb8f55a6f6 bug 12844, 13394: 1. if connect to host fails, don't need to investigate
2. add ha parameter to dissconnect host to indicate if HA VMs on this host

status 12844, 13394: resolved fixed

reviewed-by : edison

Conflicts:

	server/src/com/cloud/agent/manager/AgentManagerImpl.java
	server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
2012-01-31 15:33:39 -08:00
frank 748603f62d Bug 13269 - vmware - host put in maintenance mode> cancel maintenance mode> host remains in Connecting state
we use 'update count' to make sure agent status transformation is atomic.
However, atomic means success or fail which is not true for agent status.
some important transformation occassionally fails because race condition that
some other one is changing it simultaneously which finally makes agent stuck in a
wrong status.

use reenterent lock to serialize the agent status transformation. this memory lock
works in clusterd environement as well because in our design an agent is only active
in one mgmt server

status 13269: resolved fixed
2012-01-24 15:14:02 -08:00
Edison Su bced9a6e48 advanced startup command 2012-01-20 11:54:32 -08:00
frank 89b9c51d34 Bug 13189 - Exception logged while removing host
status 13189: resolved fixed
2012-01-19 16:50:36 -08:00
Alena Prokharchyk b1c60b9d60 bug 12964: createPhysicalNetwork/addTrafficType is no longer a part of createZone API 2012-01-10 13:55:09 -08:00
Alena Prokharchyk c581506103 bug 12306: list* command revamp 2012-01-09 10:07:42 -08:00
Alena Prokharchyk d56d1f699d bug 12790: use processDisconnect() when disconnect the agent during agent LB process
status 12790: resolved fixed

Conflicts:

	api/src/com/cloud/host/Status.java
	server/src/com/cloud/agent/manager/AgentManagerImpl.java
	server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
2011-12-30 10:03:56 -08:00
kishan e2cb4f94d6 bug 12337: Encrypt only password in host_detail table. Removed unused and duplicate references of HostDetailDao
status 12337: resolved fixed
reviewed-by: Abhi
2011-12-20 19:28:41 +05:30
frank 5d661c1e9d Fix searchcritera2 in agent monitor
get ha code back in agent manager
2011-12-08 16:17:51 -08:00
Abhinandan Prateek d90e19ae28 bug 11825: removing the trace as from the message the origin of problem can be easily traced. 2011-11-24 11:15:51 +05:30
anthony 09d89b3dc3 add more logs 2011-11-01 19:34:39 -07:00