cloudstack

Commit Graph

Author	SHA1	Message	Date
Rohit Yadav	366d82e292	FR12 (CLOUDSTACK-9993): Secure Agent Communications (#38 ) This introduces a new certificate authority framework that allows pluggable CA provider implementations to handle certificate operations around issuance, revocation and propagation. The framework injects itself to `NioServer` to handle agent connections securely. The framework adds assumptions in `NioClient` that a keystore if available with known name `cloud.jks` will be used for SSL negotiations and handshake. This includes a default 'root' CA provider plugin which creates its own self-signed root certificate authority on first run and uses it for issuance and provisioning of certificate to CloudStack agents such as the KVM, CPVM and SSVM agents and also for the management server for peer clustering. Additional changes and notes: - Comma separate list of management server IPs can be set to the 'host' global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get radomized comma separated list to which they will attempt connection or reconnection in provided order. This removes need of a TCP LB on port 8250 (default) of the management server(s). - All fresh deployment will enforce two-way SSL authentication where connecting agents will be required to present certificates issued by the 'root' CA plugin. - Existing environment on upgrade will continue to use one-way SSL authentication and connecting agents will not be required to present certificates. - A script `keystore-setup` is responsible for initial keystore setup and CSR generation on the agent/hosts. - A script `keystore-cert-import` is responsible for import provided certificate payload to the java keystore file. - Agent security (keystore, certificates etc) are setup initially using SSH, and later provisioning is handled via an existing agent connection using command-answers. The supported clients and agents are limited to CPVM, SSVM, and KVM agents, and clustered management server (peering). - Certificate revocation does not revoke an existing agent-mgmt server connection, however rejects a revoked certificate used during SSL handshake. - Older `cloudstackmanagement.keystore` is deprecated and will no longer be used by mgmt server(s) for SSL negotiations and handshake. New keystores will be named `cloud.jks`, any additional SSL certificates should not be imported in it for use with tomcat etc. The `cloud.jks` keystore is stricly used for agent-server communications. - Management server keystore are validated and renewed on start up only, the validity of them are same as the CA certificates. New APIs: - listCaProviders: lists all available CA provider plugins - listCaCertificate: lists the CA certificate(s) - issueCertificate: issues X509 client certificate with/without a CSR - provisionCertificate: provisions certificate to a host - revokeCertificate: revokes a client certificate using its serial Global settings for the CA framework: - ca.framework.provider.plugin: The configured CA provider plugin - ca.framework.cert.keysize: The key size for certificate generation - ca.framework.cert.signature.algorithm: The certificate signature algorithm - ca.framework.cert.validity.period: Certificate validity in days - ca.framework.cert.automatic.renewal: Certificate auto-renewal setting - ca.framework.background.task.delay: CA background task delay/interval - ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates Global settings for the default 'root' CA provider: - ca.plugin.root.private.key: (hidden/encrypted) CA private key - ca.plugin.root.public.key: (hidden/encrypted) CA public key - ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate - ca.plugin.root.issuer.dn: The CA issue distinguished name - ca.plugin.root.auth.strictness: Are clients required to present certificates - ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed UI changes: - Button to download/save the CA certificates. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2017-09-26 09:19:31 +05:30
Rohit Yadav	876fc7434d	APPLE-165: Host HA management and HA provider for KVM Host-HA offers investigation, fencing and recovery mechanisms for host that for any reason are malfunctioning. It uses Activity and Health checks to determine current host state based on which it may degrade a host or try to recover it. On failing to recover it, it may try to fence the host. The core feature is implemented in a hypervisor agnostic way, with two separate implementations of the driver/provider for Simulator and KVM hypervisors. The framework also allows for implementation of other hypervisor specific provider implementation in future. The Host-HA provider implementation for KVM hypervisor uses the out-of-band management sub-system to issue IPMI calls to reset (recover) or poweroff (fence) a host. The Host-HA provider implementation for Simulator provides a means of testing and validating the core framework implementation. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2017-01-18 18:18:53 +05:30
Rohit Yadav	a5de2714e9	CLOUDSTACK-9299: Out-of-band Management for CloudStack Support access to a host’s out-of-band management interface (e.g. IPMI, iLO, DRAC, etc.) to manage host power operations (on/off etc.) and querying current power state in CloudStack. Given the wide range of out-of-band management interfaces such as iLO and iDRA, the service implementation allows for development of separate drivers as plugins. This feature comes with a ipmitool based driver that uses the ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any out-of-band management interface that support IPMI 2.0. This feature allows following common use-cases: - Restarting stalled/failed hosts - Powering off under-utilised hosts - Powering on hosts for provisioning or to increase capacity - Allowing system administrators to see the current power state of the host For testing this feature `ipmisim` can be used: https://pypi.python.org/pypi/ipmisim FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2016-05-10 13:16:03 +05:30
Rohit Yadav	fb1069ace9	agent: don't investigate if host is null, send alert instead Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2015-02-05 16:42:56 +05:30
Anthony Xu	c52e14730e	when host is pingtimeout and CCP can not determine the host status, put the host to Alert status , no VM HA.	2014-10-22 15:07:40 -07:00
Anthony Xu	0141b37784	CLOUDSTACK-7761: Revert "when system VM ping times out, stop system VM" This reverts commit `ee23be1942`.	2014-10-21 17:21:17 -07:00
Anthony Xu	ee23be1942	when system VM ping times out, stop system VM (cherry picked from commit `847e1e47ae`)	2014-10-13 00:11:21 -04:00
Hugo Trippaers	8a13f44b44	Remove duplicate field in constructor	2014-09-18 16:02:26 +02:00
Anthony Xu	e5a91e40dd	in tagCommand, AsyncJobExecutionContext doesn't need to be created if it doesn't exist	2014-09-17 18:15:41 -07:00
Min Chen	d5a8f1d875	CLOUDSTACK-7553: Clean up cached agentMap and pingMap in case of agents connecting back to a different MS.	2014-09-15 17:37:51 -07:00
Likitha Shetty	8ce6eba549	CLOUDSTACK-7415. Host remains in Alert after vCenter restart. Management server PingTask should update PingMap entry for an agent only if it is already present in the Management Server's PingMap.	2014-08-25 13:24:28 +05:30
Santhosh Edukulla	e4d6cd8e6a	Fixed coverity reported concurrency issues Signed-off-by: Santhosh Edukulla <santhosh.edukulla@gmail.com>	2014-08-05 12:16:08 +05:30
Santhosh Edukulla	b7d3f1bd30	Fixed few coverity issues for resource synchronization	2014-08-04 16:09:26 +05:30
Anthony Xu	330c4ba578	completed the new vmsync TODOs in the code. removed old vmsync logic	2014-07-28 12:51:37 -07:00
Kelven Yang	f756d4aa33	Make job info universally available across management server and resource agents	2014-06-24 16:28:22 -07:00
Koushik Das	d5754d9101	CLOUDSTACK-6740: Direct agent command throttling improvements List of changes: 1. Created a separate thread pool for handling cron and ping tasks. The size of the pool is based on direct.agent.pool.size. The existing direct agent pool will run all commands other than cron and ping. 2. For normal tasks (generated as part of user/admin API calls), if throttle limit is reached then tasks get queued up for subsequent execution once threads are available. 3. For cron and ping tasks (internally generated by MS like ping, VM sync etc.), if throttle limit is reached then these gets rejected. Since these are internally generated these can be rejected without any issues.	2014-05-22 14:15:42 +05:30
Ding Yuan	c031eb7d38	CLOUDSTACK-6242: exception handling improvements Signed-off-by: Daan Hoogland <daan@onecht.net>	2014-04-15 08:07:15 +02:00
Koushik Das	54e9a98e8b	CLOUDSTACK-6362: Parallel VM deployment - direct.agent.thread.cap needs to default to 1.0 (currently 0.1) to allow for parallel Vm deployments.	2014-04-09 12:28:01 +05:30
Alex Huang	4ebb92c492	Fixed some warnings about using a deprecated constructor	2014-03-25 16:48:27 -07:00
Alex Huang	f445274ed3	Added a config to enable checking whether a db transaction is wrapped around communications with the agent. If it is, an exception is thrown. This assert has actually been there because it is part of CloudStack's design principle to not use db transactions as a way to enforce atomicity in executing things on hardware resources. However, the assert has been ignored since the move to maven which is not good with enabling asserts. Since then, there's been a lot of commands added that actually runs within db transaction. This is a big no no as the problem is that the remote operation may take a long time and the db can actually close the connection, causing a rollback of the transaction. We should not depend on transactions to enforce the atomicity anyways.	2014-03-25 16:35:49 -07:00
Kelven Yang	90262a81ec	Do not do investigation for SSVM/CPVM agent host upon disconnect.	2014-02-28 15:36:00 -08:00
Hugo Trippaers	26b32141a8	Findbugs : Fixes for several findings Made a comment on the use of ConcurrentHashMap for _agent Conflicts: engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java	2014-02-14 18:37:45 +01:00
Devdeep Singh	c75f8bcc06	CLOUDSTACK-5420: The agent manager wasn't transitioning the host to maintenance mode if their are no vms running on the host. Made the change to do so.	2013-12-26 11:15:51 +05:30
Alex Huang	be5e5cc641	All Checkstyle problems corrected	2013-12-12 12:26:07 -08:00
Alena Prokharchyk	bd6f706b72	CLOUDSTACK-5261: added support for Alert publishing via ROOT Admin API Conflicts: engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java engine/storage/image/src/org/apache/cloudstack/storage/image/TemplateServiceImpl.java engine/storage/volume/src/org/apache/cloudstack/storage/datastore/provider/DefaultHostListener.java engine/storage/volume/src/org/apache/cloudstack/storage/volume/VolumeServiceImpl.java plugins/hypervisors/hyperv/src/com/cloud/hypervisor/hyperv/discoverer/HypervServerDiscoverer.java plugins/hypervisors/vmware/src/com/cloud/hypervisor/vmware/VmwareServerDiscoverer.java plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpServerDiscoverer.java server/src/com/cloud/alert/AlertManagerImpl.java server/src/com/cloud/alert/ConsoleProxyAlertAdapter.java server/src/com/cloud/alert/SecondaryStorageVmAlertAdapter.java server/src/com/cloud/configuration/ConfigurationManagerImpl.java server/src/com/cloud/ha/HighAvailabilityManagerExtImpl.java server/src/com/cloud/ha/HighAvailabilityManagerImpl.java server/src/com/cloud/network/router/VirtualNetworkApplianceManagerImpl.java server/src/com/cloud/resourcelimit/ResourceLimitManagerImpl.java server/src/com/cloud/storage/snapshot/SnapshotManagerImpl.java server/src/com/cloud/vm/UserVmManagerImpl.java usage/src/com/cloud/usage/UsageAlertManagerImpl.java usage/src/com/cloud/usage/UsageManagerImpl.java listAlerts: introduced new parameter "name" to the alertResponse Conflicts: api/src/org/apache/cloudstack/api/command/admin/resource/ListAlertsCmd.java server/src/com/cloud/alert/AlertManagerImpl.java usage/src/com/cloud/usage/UsageAlertManagerImpl.java Added new Admin API - generateAlert. Available to ROOT admin only Conflicts: api/src/org/apache/cloudstack/alert/AlertService.java api/src/org/apache/cloudstack/api/BaseCmd.java usage/src/com/cloud/usage/UsageAlertManagerImpl.java listAlerts: implemented search by alert name Conflicts: api/src/org/apache/cloudstack/alert/AlertService.java api/src/org/apache/cloudstack/api/command/admin/resource/ListAlertsCmd.java engine/schema/src/com/cloud/alert/AlertVO.java	2013-12-04 10:05:46 -08:00
Alex Huang	d620df2bdd	Reformatted all of the code.	2013-11-21 06:15:26 -08:00
Alex Huang	8d62744681	Reformat all source code. Added checkstyle to check the source code	2013-11-20 07:26:53 -08:00
Laszlo Hornyak	6f3688d13d	Fill the creationMonitors based on priority If not priotity, append to the end of the list. Signed-off-by: Laszlo Hornyak <laszlo.hornyak@gmail.com>	2013-11-08 21:51:04 +01:00
Koushik Das	269a4ef11e	CLOUDSTACK-4855: Throttle based on the # of outstanding requests to the directly managed HV host (direct agents) Cloudstack sends requests to directly managed HV hosts (direct agents) using the direct agent thread pool. The size of the pool is determined by global config direct.agent.pool.size defaulted to 500. Currently there is no restriction on the number of threads a direct agent can use from this shared thread pool to send requests to the host. This is fine as long as the host is responding to requests in a reasonable amount of time. But if there is a considerable delay in getting response, the thread remain blocked for that much time. As more commands are send to the slow host threads keep getting blocked. This can eventually lead to a situation where requests to healthy hosts cannot be processed as there are not enough free threads. The problem being addressed here is to localize the impact of few bad hosts, so that entire management server is not affected. One such way is to throttle based on the # of outstanding requests on per host basis. The outstanding requests to a host will be a % of direct agent pool size. This is configurable based on direct.agent.thread.cap. The default value is 0.1 or 10%, a value of 1 would mean the old behavior where there is no upper cap. This will ensure that the impacted host will be bound by a upper cap on the number of threads it can use to process requests and not the entire pool.	2013-11-04 14:52:26 +05:30
Darren Shepherd	f62e28c1ec	New Transaction API Introduction of a new Transaction API that is more consistent with the style of Spring's transaction managment. The existing Transaction class was renamed to TransactionLegacy. All of the non-DAO code in the management server has been updated to use the new Transaction API.	2013-10-16 09:21:00 -07:00
Marcus Sorensen	4e0e7410e9	Store agent hostname in attache, print it in logs wherever possible. This was discussed on the mailing list as a useful debugging tool, currently the log prints the DB id of the agent, which makes admins have to look it up to know where the Command was run.	2013-10-14 11:46:01 -06:00
Darren Shepherd	aed5e9dc2a	Add Manage Context framework The managed context framework provides a simple way to add logic to ACS at the various entry points of the system. As threads are launched and ran listeners can be registered for onEntry or onLeave of the managed context. This framework will be used specifically to handle DB transaction checking and setting up the CallContext. This framework is need to transition away from ACS custom AOP to Spring AOP.	2013-10-02 13:09:52 -07:00
Alex Huang	b60eef3e82	Added comments and finished off the work	2013-09-28 07:53:28 -07:00
Alex Huang	e8cac2c5d8	Changed SearchCriteria2 to GenericQueryBuilder to reflect the same placement	2013-09-28 07:53:26 -07:00
Alex Huang	e2988902c9	Changed SearchCriteria2 to GenericQueryBuilder to reflect the same placement	2013-09-28 07:53:25 -07:00
Alex Huang	af8832f6bd	Unified both the SearchBuilder and SearchCriteriaService	2013-09-28 07:53:24 -07:00
Koushik Das	ae181afb00	CLOUDSTACK-4636: In a scaled up setup all Vm's in a cluster were stopped and/or started after management server restart Issue happens as there are more than one thread processing connect for a host simultaneously. The VM full sync. is not designed to work in this scenario and as a result user VMs may get stopped incorrectly. Direct agent scan task runs at regular intervals (direct.agent.scan.interval defaulted to 90 secs) and identifies hosts that needs to be processed for connect. In a normal scenario hosts mostly get connected within that interval and there are no issues. But if due to some reason the connect process takes more time and is not completed by the time next agent scan runs. In this case, based on the db. state same hosts may get picked up again. And then there will be situations where more than one thread is processing connect for the same host. The fix is to check if there is a thread already processing connect for a host and in this case all subsequent threads for that host will simply bail out. Also there may be a scenario where one thread already completed processing connect but another thread already got scheduled before that and will again repeat the same. This is also prevented by putting appropriate checks.	2013-09-10 17:21:36 +05:30
Alex Huang	a05ec6df33	Fixed up the agent separation. Added comments for config packaging.	2013-09-06 15:40:39 -07:00
Alex Huang	8f556e6d88	Made changes to configuration. Eliminated ConfigValue and only use ConfigKey	2013-09-06 15:40:38 -07:00
Alex Huang	b8e79c30a8	Compile complete	2013-09-06 15:40:37 -07:00
Alex Huang	435e74e914	Commit to try something on removing getZone	2013-09-06 15:40:33 -07:00

41 Commits