The issue happens randomly when hosts in a cluster gets distributed across multiple MS. Host can get split in following scenarios:
a. Add host – MS on which add host is executed takes ownership of the host. So if 2 hosts belonging to same cluster are added from 2 different MS then cluster gets split
b. scanDirectAgentToLoad – This runs every 90 secs. and check if there are any hosts that needs to be reconnected. The current logic of host scan can also lead to a split
The idea is to fix (b) to ensure that hosts in a cluster are managed by same MS. For (a) only the entry in the database is going to be created except in case if the host getting added is first in the cluster (in this case agent creation happens at the same time) and then (b) will take care of connection and agent creation part. Since currently addHost only creates an entry in the db there is a small window where the host state will be shown as 'Alert' till the time (b) is scheduled and picks up the host to make a connection. The MS doing add host will immediately schedule a scan task and also send notification to peers to start the scan task.
* send StartupAnswer right after StartupCommand is recieved
* if post processor going wrong, send out readycommand with error message to agent, then agent will exit
* attache is not forForward()
* and the Disconnect came through cluster notification event
The fix will prevent delayed AgentDisconnect cluster notification processing.
73be77a4c1
I've renamed discover to discoverer to fix the issue. My ant debug fails
with:
[java] ERROR [utils.component.ComponentLocator] (main:) Unable to
load configuration for management-server from components.xml
[java] com.cloud.utils.exception.CloudRuntimeException: Unable to
find class: com.cloud.hypervisor.kvm.discoverer.KvmServerDiscoverer
RB: https://reviews.apache.org/r/6239/
Send-by: rohit.yadav@citrix.com
When a host is not considered for deployment because it has disabled HVM, then call that out in the logs for debugging.
Signed-off-by: Nitin Mehta<nitin.mehta@citrix.com>
Fixed the bug where vm_instance.ha_enabled wasn't updated during service offering upgrade
Conflicts:
server/src/com/cloud/server/ManagementServerImpl.java
2) Added new api - changeServiceForSystemVm - to support service offering upgrade for system vms
3) Removed global config parameters that are not in use anymore: consoleproxy.ram.size, consoleproxy.cpu.mhz, secstorage.vm.ram.size, secstorage.vm.cpu.mhz