Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Misc changes:
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Support access to a host’s out-of-band management interface (e.g. IPMI, iLO,
DRAC, etc.) to manage host power operations (on/off etc.) and querying current
power state in CloudStack.
Given the wide range of out-of-band management interfaces such as iLO and iDRA,
the service implementation allows for development of separate drivers as plugins.
This feature comes with a ipmitool based driver that uses the
ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any
out-of-band management interface that support IPMI 2.0.
This feature allows following common use-cases:
- Restarting stalled/failed hosts
- Powering off under-utilised hosts
- Powering on hosts for provisioning or to increase capacity
- Allowing system administrators to see the current power state of the host
For testing this feature `ipmisim` can be used:
https://pypi.python.org/pypi/ipmisim
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This reverts commit cd7218e241, reversing
changes made to f5a7395cc2.
Reason for Revert:
noredist build failed with the below error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project cloud-plugin-hypervisor-vmware: Compilation failure
[ERROR] /home/jenkins/acs/workspace/build-master-noredist/plugins/hypervisors/vmware/src/com/cloud/hypervisor/guru/VMwareGuru.java:[484,12] error: non-static variable logger cannot be referenced from a static context
[ERROR] -> [Help 1]
even the normal build is broken as reported by @koushik-das on dev list
http://markmail.org/message/nngimssuzkj5gpbz
- Backend should update if state was diabled and now has changed
- UI's fetch latest does not actually fetch latest
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 985a61652e)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When sendAlert is called on an AlertManager impl, if it fails it logs that
something was wrong but does not log the body of the issue/error. This means
we tell the user/admin that there was an issue but don't share the "issue"
with them at all as the email alert fail (or that they were not initialized).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 885c02dbd8)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Conflicts:
server/src/com/cloud/alert/AlertManagerImpl.java
usage/src/com/cloud/usage/UsageAlertManagerImpl.java
The managed context framework provides a simple way to add logic
to ACS at the various entry points of the system. As threads are
launched and ran listeners can be registered for onEntry or onLeave
of the managed context. This framework will be used specifically
to handle DB transaction checking and setting up the CallContext.
This framework is need to transition away from ACS custom AOP to
Spring AOP.
Addition of two new resource types i.e. Primary and Secondary storage space in the existing pool of
resource types.
Added methods to set the limits on these resources using updateResourceLimit
API command and to get a count using updateResourceCount. Also added calls in the
Templates, Volumes, Snapshots life cycle to check these limits and to increment/decrement the new
resource types
Resource Name :: Resource type number
Primary Storage 10
Secondary Storage 11
Also added jUnit Tests for the same.
Reviewed by : nitin mehta<nitin.mehta@citrix.com>
This feature will provide the functionality to delete or archive
the Alerts/Events.
Delete or archive alerts APIs are available only for root-admin,
delete or archive events are available for regular users.
following changes
- introduced notion of event bus with publish, subscribe, unsubscribe
semantics
- a plug-in can implement the EventBus abstraction to provide event
bug to CloudStack
- A rabbitMQ based plug-in that can interact with AMQP servers to
provide message broker based event-bug
- stream lines, action events, usage events, alerts publishing in to
convineance classed which are also used to publish corresponding
event on to event bus
- introduced notion of state change event. On a state change, in the
state machine corrsponding to the resource, a state change event is
published on the event bug
- associated a state machined with Snapshot and Network objects
- Virtual Machine, Volume, Snaphost, Network object state changes wil
result in a state change event