Commit Graph

214 Commits

Author SHA1 Message Date
Rohit Yadav 366d82e292 FR12 (CLOUDSTACK-9993): Secure Agent Communications (#38)
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.

This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.

Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
  global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
  radomized comma separated list to which they will attempt connection
  or reconnection in provided order. This removes need of a TCP LB on
  port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
  connecting agents will be required to present certificates issued
  by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
  authentication and connecting agents will not be required to present
  certificates.
- A script `keystore-setup` is responsible for initial keystore setup
  and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
  certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
  SSH, and later provisioning is handled via an existing agent connection
  using command-answers. The supported clients and agents are limited to
  CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
  connection, however rejects a revoked certificate used during SSL
  handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
  be used by mgmt server(s) for SSL negotiations and handshake. New
  keystores will be named `cloud.jks`, any additional SSL certificates
  should not be imported in it for use with tomcat etc. The `cloud.jks`
  keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
  the validity of them are same as the CA certificates.

New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial

Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates

Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed

UI changes:
- Button to download/save the CA certificates.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-09-26 09:19:31 +05:30
Abhinandan Prateek 775e73c38e CLOUDSTACK-9182: Some running VMs turned off on manual migration when auto migration failed while host preparing for maintenance. 2017-04-18 11:25:24 +05:30
Rohit Yadav 876fc7434d APPLE-165: Host HA management and HA provider for KVM
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.

The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.

The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.

The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-01-18 18:18:53 +05:30
Rohit Yadav a5de2714e9 CLOUDSTACK-9299: Out-of-band Management for CloudStack
Support access to a host’s out-of-band management interface (e.g. IPMI, iLO,
DRAC, etc.) to manage host power operations (on/off etc.) and querying current
power state in CloudStack.

Given the wide range of out-of-band management interfaces such as iLO and iDRA,
the service implementation allows for development of separate drivers as plugins.
This feature comes with a ipmitool based driver that uses the
ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any
out-of-band management interface that support IPMI 2.0.

This feature allows following common use-cases:
- Restarting stalled/failed hosts
- Powering off under-utilised hosts
- Powering on hosts for provisioning or to increase capacity
- Allowing system administrators to see the current power state of the host

For testing this feature `ipmisim` can be used:
https://pypi.python.org/pypi/ipmisim

FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-05-10 13:16:03 +05:30
Rohit Yadav e51f524039 CLOUDSTACK-9348: Use non-blocking SSL handshake
- Uses non-blocking SSL handshake and non-blocking connections
- Uses 60s as timeout for both client/server to guard against indefinitely
  blocking clients
- Unit test to prove fix, client and malicious clients trying to connect to server

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-05-06 10:07:47 +05:30
Daan Hoogland 84e9ff746b CLOUDSTACK-8848: added null pointer guard to new public method 2016-01-20 12:54:04 +01:00
Rene Moser f7e1029cfc CLOUDSTACK-8848: ensure power state is up to date when handling missing VMs in powerReport
There 2 things which has been changed.

* We look on power_state_update_time instead of update_time. Didn't make sense to me at all to look at update_time.
* Due DB update optimisation, powerState will only be updated if < MAX_CONSECUTIVE_SAME_STATE_UPDATE_COUNT. That is why we can not rely on these information unless we make sure these are up to date.
2016-01-20 12:54:03 +01:00
Abhinandan Prateek c21aa89a47 CLOUDSTACK-8491: Host maintenance fails if a vm on it is running a custom service offering VM 2015-05-21 10:40:59 +05:30
Likitha Shetty e1db982d6b CLOUDSTACK-8410. ESXi host stuck disconnects frequently.
During ping task, while scanning and updating status of all VMs on the host that are stuck in a transitional state
and are missing from the power report, do so only for VMs that are not removed.

(cherry picked from commit de7173a0ed)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-04-29 16:50:40 +02:00
Santhosh Edukulla 86943da26e Fixed few coverity issues
Signed-off-by: Santhosh Edukulla <santhosh.edukulla@gmail.com>
(cherry picked from commit 0a9742f914)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-04-27 14:43:48 +02:00
MS b34202a84e CLOUDSTACK-8387 - Close mgmt server peer socket on failure, without relying on autoclose 2015-04-15 09:50:32 -07:00
MS fde2615c33 CLOUDSTACK-8387 - Close mgmt server peer socket on failure, without relying on autoclose 2015-04-15 08:43:37 -07:00
MS cb7bcf23fe CLOUDSTACK-8387 - Don't autoclose new mgmt server peer connections as soon as they open 2015-04-15 08:18:24 -07:00
Rohit Yadav 43db75c319 CLOUDSTACK-7593: allow nic type to be fetched from vm's details
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-04-13 15:44:09 +05:30
Daan Hoogland 8ad2e309a4 CLOUDSTACK-8238 handling of retry ping improved
Fixed on 4.4 and master but not on 4.5, cherry-picked on 4.5 using commit
fbafc957dc

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Conflicts:
	engine/orchestration/src/com/cloud/agent/manager/DirectAgentAttache.java
2015-02-16 11:35:41 +05:30
Santhosh Edukulla 78bfaa79cf Fixed few coverity issues like invalid boxing unboxing issues, resource leaks, null dereferences
(cherry picked from commit ef6ec7b276)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-06 16:50:20 +05:30
Rohit Yadav fb1069ace9 agent: don't investigate if host is null, send alert instead
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-05 16:42:56 +05:30
Rohit Yadav e40b06e9ca AgentAttache: allow checkonhost command in maintenance, cancel if only allowed
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-05 16:29:44 +05:30
Santhosh Edukulla c25263ba81 Fixed Coverity Issues 2015-02-05 15:59:29 +05:30
Devdeep Singh 0e4d91aa91 CLOUDSTACK-6924. To attach a volume if a volume needs to be moved to another storage
pool, the source and destination pools cannot be local and cluster/zone and vice versa.
Cloudstack detects it and throws a exception. However, the end user only sees an
unexpected exception and not the reason for failure. Fixed it by making sure the
reason for the failure is correctly captured and shown to the end user.

(cherry picked from commit cffae8eef0)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Conflicts:
	server/src/com/cloud/storage/VolumeApiServiceImpl.java
2015-02-02 14:36:07 +05:30
Kishan Kavala 0c1172ffe9 Network offering usage event should be logged for UserVms only
(cherry picked from commit 42cecbb000)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-02 14:27:24 +05:30
Likitha Shetty 294f5bf331 CLOUDSTACK-8114. Ensure VM stop and then start updates the volume path correctly in the DB.
(cherry picked from commit 521258bafb)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-02 14:27:04 +05:30
Likitha Shetty bcbfe3bdee CLOUDSTACK-8129. Cold migration of VM across VMware DCs leaves the VM behind in the source host.
If VM has been cold migrated across different VMware DCs, then unregister the VM from source host.

(cherry picked from commit 15b348632d)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-02 13:51:47 +05:30
Likitha Shetty 45d32234a6 CLOUDSTACK-8112. CS allows creation of VM's with the same Display name when vm.instancename.flag is set to true.
Before registering a VM check if a different CS VM with same name exists in vCenter.

(cherry picked from commit 33179cce56)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-02 13:51:22 +05:30
Santhosh Edukulla 737edd90dc Fixed few coverity patches
NPE in delete firewall rules observed, cherry-picking fix from master.

(cherry picked from commit 31a42d2b7a)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-02-02 12:48:38 +05:30
Rohit Yadav debfcdef78 CLOUDSTACK-8160: use preferable protocols
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-01-21 18:02:58 +05:30
Sanjay Tripathi 8790b84b20 CLOUDSTACK-7940: Exception printed completely on the UI. Not in a readable format.
(cherry picked from commit dda2994936)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-01-20 11:34:48 +05:30
Kishan Kavala 1e87f3b80b Bug-Id: CLOUDSTACK-3439: Include dynamically created nics in Prepare for migration command in KVM
(cherry picked from commit f767adfe71)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-01-18 18:19:24 +05:30
Saksham Srivastava a1791cb4a8 CLOUDSTACK-8088: VM scale up is failing in vmware with Unable to execute ScaleVmCommand due to java.lang.NullPointerException
(cherry picked from commit 1df0453d27)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-01-18 17:28:17 +05:30
Koushik Das 788fe5a273 CLOUDSTACK-8103: Vmsync marks VM as stopped even after failing to stop it in HV
During vmsync if StopCommand (issued as part of PowerOff/PowerMissing report) fails to stop VM (since VM is running on HV),
don't transition VM state to "Stopped" in CS db. Also added a check to throw ConcurrentOperationException if vm state is not
"Running" after start operation.
2014-12-22 12:31:34 +05:30
Prachi Damle a7861aa5fa CLOUDSTACK-8079: If the cluster capacity threshold is reached, HA-enabled VM is not migrated on another host during HA
Changes:
-  When there is HA we try to redeploy the affected vm using regular planners and if that fails we retry using the special planner for HA (which skips checking disable threshold)
Now because of job framework the InsufficientCapacittyException gets masked and the special planners are not called. Job framework needs to be fixed to rethrow the correct exception.
- Also the VM Work Job framework is  not setting the DeploymentPlanner to the VmWorkJob.  So the HA Planner being passed by HAMgr was not getting used.
- Now the job framework sets the planner passed in by any caller of the VM Start operation, to the job
2014-12-17 13:48:24 -08:00
Rohit Yadav d42e3df9cf CLOUDSTACK-7563: Fix potential NPE in checking answer
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2014-12-02 16:14:29 +05:30
Loic Lambiel 99cb19787e CLOUDSTACK-7404: Failed to start an instance when originating template has been deleted
Signed-off-by: Sebastien Goasguen <runseb@gmail.com>
(cherry picked from commit c1bf7eeeee)
2014-12-01 13:05:12 +01:00
Anthony Xu ab19edf09d CLOUDSTACK-7742:
root cause:
when vmsync reports system VM is down, CCP doesn't release the VM resource before starting it.
fix:
make sure cleanup is called for a VM when it is reported as Stopped
2014-11-19 16:27:51 -08:00
Edison Su d856a2acad CLOUDSTACK-7946:
remove leftover state in volume and snapshot table in case of mgt server
shutdown during storage operation.
Reviewed-by: Min
2014-11-19 16:08:27 -08:00
Koushik Das e25de54b4c CLOUDSTACK-7421
Unnecessary exception in MS logs while removing default NIC from VM. Following changes are made:
1. Changed the exception from CloudRuntimeException to InvalidParameterValueExecption.
2. Moved out validation logic to UserVMManagerImpl from VirtualMachineManagerImpl.
3. Handling InvalidParameterValueException from async API calls so that they are not logged as ERROR in MS logs.
2014-11-08 13:50:15 +05:30
Min Chen 07ba078ee6 CLOUDSTACK-7833: VM Async work jobs log "Was unable to find lock for the key vm_instance" errors as warning 2014-11-03 11:19:06 -08:00
Min Chen 0d6f69b536 CLOUDSTACK-7778: Start VM checkWorkItem loop should also check VM DB state before going into idle waiting to exit faster. 2014-10-23 14:38:56 -07:00
Anthony Xu c52e14730e when host is pingtimeout and CCP can not determine the host status, put the host to Alert status , no VM HA. 2014-10-22 15:07:40 -07:00
Anthony Xu 0141b37784 CLOUDSTACK-7761:
Revert "when system VM ping times out, stop system VM"

This reverts commit ee23be1942.
2014-10-21 17:21:17 -07:00
Min Chen e7fa3a2959 CLOUDSTACK-7563: Fix potential NPE from FingBugs. 2014-10-14 11:21:01 -07:00
Edison Su 1c1485e0f0 disable parallel for xenserver. Also for vmware, if full.clone is enabled and migratecommand will have the behavor of start/stop command
(cherry picked from commit d233f39c82)
2014-10-13 00:39:33 -04:00
Anthony Xu ee23be1942 when system VM ping times out, stop system VM
(cherry picked from commit 847e1e47ae)
2014-10-13 00:11:21 -04:00
Devdeep Singh 43a9bbf509 CLOUDSTACK-7598: When a vm deployed by cloudstack is stopped on the hypervisor
(outside cloudstack), the state of the vm is not updated in cloudstack db. The
ping task was not checking for resource (host) status by default. The power
state of the vms is returned as part of the resource status. Fixed the issue by
making sure ping task atleast tries once to get the resource status.

(cherry picked from commit 55b4ead495)
2014-10-12 23:55:17 -04:00
Hugo Trippaers 8a13f44b44 Remove duplicate field in constructor 2014-09-18 16:02:26 +02:00
Daan Hoogland 7f440854f7 CLOUDSTACK-7184 retry-wait loop config to deal with network glitches
(cherry picked from commit a29f954a26)

Conflicts:
	engine/orchestration/src/com/cloud/agent/manager/DirectAgentAttache.java
2014-09-18 12:55:05 +02:00
Rohit Yadav 2cd99e3e72 CID-1116245: Lock for getting answers from SynchronousListener
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2014-09-18 11:56:29 +02:00
Rohit Yadav 3640c61207 CID-1116248: Synchornize method in SynchronousListener
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2014-09-18 11:54:31 +02:00
Anthony Xu e5a91e40dd in tagCommand, AsyncJobExecutionContext doesn't need to be created if it doesn't exist 2014-09-17 18:15:41 -07:00
Min Chen 1b15efb5f0 CLOUDSTACK-7563: ClassCastException in VirtualMachineManagerImpl in
handling various Agent command answer.
2014-09-16 12:12:17 -07:00