Commit Graph

25636 Commits

Author SHA1 Message Date
Rohit Yadav 9bc90afdf3 FR3: Host-HA backported changes from master
- Improves job scheduling using state/event-driven logic
- Reduced database and cpu load, by reducing all background threads to one
- Improves Simulator and KVM host-ha integration tests
- Triggers VM HA on successful host (ipmi reboot) recovery
- Improves internal datastructures and checks around HA counter
- New FSM events to retry fencing and recovery
- Fixes KVM activity script to aggresively check against last update time

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-09-01 11:42:52 +02:00
Rohit Yadav 0f0e7f2011 FR12 (CLOUDSTACK-9993): Secure Agent Communications
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.

This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.

Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
  global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
  radomized comma separated list to which they will attempt connection
  or reconnection in provided order. This removes need of a TCP LB on
  port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
  connecting agents will be required to present certificates issued
  by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
  authentication and connecting agents will not be required to present
  certificates.
- A script `keystore-setup` is responsible for initial keystore setup
  and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
  certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
  SSH, and later provisioning is handled via an existing agent connection
  using command-answers. The supported clients and agents are limited to
  CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
  connection, however rejects a revoked certificate used during SSL
  handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
  be used by mgmt server(s) for SSL negotiations and handshake. New
  keystores will be named `cloud.jks`, any additional SSL certificates
  should not be imported in it for use with tomcat etc. The `cloud.jks`
  keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
  the validity of them are same as the CA certificates.

New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial

Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates

Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed

UI changes:
- Button to download/save the CA certificates.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-08-23 12:42:59 +02:00
Abhinandan Prateek 327360279d Fr17b (#41)
* FR17: 1. Add timeout to the volume stats command
2. When a unknown command is received return a BadCommand from request processor

* FR17: Unit test for checking bad and a good command sent to the agent as json
2017-08-10 14:39:36 +02:00
Rohit Yadav a9f268b52b java: add java 1.7 version for jenv
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-08-06 18:41:10 +02:00
dahn 576b4c7c27 FR23 plugable isolation methods
FR23 plugable isolation methods
This is brought to the public version as CLOUDSTACK-10007
2017-07-28 16:28:38 +02:00
Rohit Yadav c1118c2a4e FIX3: Consider overcommit ratios with total/threshold values for host metrics
Consider the CPU and memory overcommit ratios with total cpu/ram values
or thresholds for host metrics. This will fix incorrect notification
(cells turning yellow/red) in the metrics view.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-07-24 12:38:28 +02:00
Rohit Yadav fe88eecfd6 Revert "FIX3: Consider overcommit ratios with total/threshold values"
This reverts commit 9c82452b82.
2017-07-22 13:13:25 +02:00
dahn 3caef4487e Merge pull request #42 from shapeblue/fr13-annotations
annotations (on hosts)
2017-07-13 10:31:41 +02:00
Boris 5a229b369f Adding marvin tests 2017-07-13 10:30:33 +02:00
Daan Hoogland 09173a4466 annotations on hosts 2017-07-13 10:29:51 +02:00
Rohit Yadav b539b48a69 FIX2: Allow creation of roles with names of deleted roles
This allows admins to create roles with names of previously deleted
roles.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-07-11 14:06:24 +05:30
Rohit Yadav 9c82452b82 FIX3: Consider overcommit ratios with total/threshold values
Consider the CPU and memory overcommit ratios with total cpu/ram values
or thresholds for host and cluster metrics. This will fix incorrect
notification (cells turning yellow/red) in the metrics view.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-07-11 12:52:48 +05:30
Rohit Yadav 2590c2521a CLOUDSTACK-9983: Hide credentials in listClusters response
This removes username and passwords details from the listClusters
response. The details are usually seen in VMware environments only.
With dynamic roles features, the listClusters API may be provided
to a read-only root-admin user role/type which should not be able to get
the credentials.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-07-05 13:32:45 +05:30
Abhinandan Prateek c5e9e98ab5 FR17-b: Following enhancements are made to FR17
1. Add timeout to the volume stats command
2. When a unknown command is received return a BadCommand from request processor
3. Unit test for checking bad and a good command sent to the agent as json
2017-06-14 08:40:41 +05:30
Rohit Yadav 986497d891 FR20: Allow native CloudStack users to change password from the UI
This allows native CloudStack users to change password from the UI.
Overall changes:
- New 'usersource' key returned in the listUsers API
- Removed ldap specific check from the UI, added checks based on usersource
- Native CloudStack users will be allowed to change password from the UI

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-05-30 14:23:38 +05:30
Rohit Yadav bf7efaa98d cw1314: Fix high CPU deviation issues seen in metrics view
HostStats returns cpu usage in percentage while memory usage in bytes.
This fixes a regression in maximum CPU usage deviation that did not
assume the values to be in percentage and multiple the final ratios
with 100 which leads to 100x the actual deviation value.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-05-10 12:42:12 +05:30
dahn b70140c1bd Merge pull request #30 from shapeblue/nio-logging
logging improvements
2017-05-09 13:55:11 +01:00
Rohit Yadav 5d58f34a4f Merge pull request #36 from shapeblue/cw-1300
CW1300: Fix hyperv log4j transformation
2017-04-27 14:59:58 +05:30
Rohit Yadav 43a1c9e24a CW1300: Fix hyperv log4j transformation
Fixes log4j transformation using replace.properties to translate
@AGENTLOG@ to a valid value during rpm/mvn build.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-04-25 15:21:25 +05:30
Abhinandan Prateek 8c433b2307 Merge pull request #35 from shapeblue/9182
CLOUDSTACK-9182: Some running VMs turned off on manual migration when…
2017-04-25 11:18:01 +05:30
Rohit Yadav f89d06b0f6 Merge pull request #34 from shapeblue/fr19-oobm-plugin-cloudstack
APPLE-333: Oobm plugin for nested-cloudstack environments
2017-04-21 12:26:19 +05:30
Rohit Yadav 8f3cd943b1 APPLE-333: Oobm plugin for nested-cloudstack environments
This implements an out-of-band management plugin for nested-cloudstack
environments where the hypervisor host is a VM in a parent CloudStack environment
that is used as a host in the (testing) CloudStack environment. This plugin
allows power operations to translate into start/stop/reboot of the VM (host).

The out-of-band management configuration accepted are:
- Address: The API URL of the parent CloudStack enviroment
- Port: The uuid of the (host) VM in the parent CloudStack environment
- Username: The apikey of the user account who has ownership on the (host) VM
- Password: The secretkey of the user account who has ownership on the (host) VM

Note: change password of the oobm interface is not support by this plugin

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-04-19 16:58:38 +05:30
Abhinandan Prateek 775e73c38e CLOUDSTACK-9182: Some running VMs turned off on manual migration when auto migration failed while host preparing for maintenance. 2017-04-18 11:25:24 +05:30
Abhinandan Prateek 8f7428837e Merge pull request #33 from shapeblue/fr17
FR17: type conversion fix
2017-03-30 12:40:49 +05:30
Abhinandan Prateek b1c35af8c2 FR17: Metrics fix 2017-03-30 11:59:35 +05:30
Abhinandan Prateek 9b181dbf19 Merge pull request #32 from shapeblue/cw1261
CW1261: Do not reset connection for user managed connections
2017-03-29 22:54:57 +05:30
Abhinandan Prateek 6cab0308d7 Merge pull request #31 from shapeblue/fr17
FR17: list vm physical size, virtual size and utilisation in listvolume API
2017-03-29 16:29:47 +05:30
Abhinandan Prateek 4991d165f3 FR-17: KVM, Xen and VMware support + UI with Marvin test 2017-03-27 09:53:40 +05:30
Abhinandan Prateek b3f6d9136e CW1261: Do not reset connection for user managed connections 2017-03-24 12:42:41 +05:30
Daan Hoogland 0eca48ad86 logging improvements 2017-03-03 07:36:14 +01:00
Rohit Yadav 40f2f6ff45 Merge pull request #29 from shapeblue/metrics-apis-4.5
APPLE-328: Metrics View APIs
2017-02-16 13:53:27 +05:30
Rohit Yadav a00cb07ee0 APPLE-328: Metrics View APIs
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-02-16 13:20:30 +05:30
Rohit Yadav ad19e00d13 Merge pull request #13 from shapeblue/host-ha-4.5
APPLE-165: Host HA and KVM provider
2017-01-18 18:25:05 +05:30
Rohit Yadav 876fc7434d APPLE-165: Host HA management and HA provider for KVM
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.

The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.

The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.

The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-01-18 18:18:53 +05:30
Rohit Yadav 2b4cdd6580 Merge pull request #26 from shapeblue/oobm-ui-password-fix
APPLE-320: Bypass password validation for oobm
2016-12-10 01:34:21 +05:30
Rohit Yadav ac70308d9a Merge pull request #23 from shapeblue/apple-313-cw1078-kvmreboot
APPLE-313: Fixes for CW1078
2016-12-10 01:34:08 +05:30
Rohit Yadav e52038ba9e APPLE-313: Fix memory leak in VmwareContextPool
- Fixes synchronization issue
- Uses ConcurrentLinkedQueue

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-11-23 13:28:16 +05:30
Rohit Yadav 33f8d48e78 APPLE-320: Bypass password validation for oobm
Allows special character, otherwise not allowed for password fields
throughout cloudstack UI.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-11-21 19:00:22 +05:30
Abhinandan Prateek 635aa20058 CLOUDSTACK-9460: For long running transactions, if the connection is
timed out by the mysql server then refresh it
2016-11-21 15:09:50 +05:30
Abhinandan Prateek 066057d7c4 CLOUDSTACK-9571: fence gracefully using clustermanger's notifyNodeIsolated 2016-11-21 15:09:50 +05:30
Abhinandan Prateek 6fdd19fa7e CLOUDSTACK-9571: Fence DB if there are consecutive connection errors. 2016-11-21 15:09:50 +05:30
Rohit Yadav eecd3fb349 APPLE-313: Ulimit fixes for cloudstack-{agent, management}
Increases/sets ulimit for cloudstack agent and management. This would fix
any issues with opening more files than permissible limit (usually 1024-4096).

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-11-08 16:20:04 +05:30
Rohit Yadav 6d79c7c5b7 Merge pull request #24 from shapeblue/cve-2016-6813
CLOUDSTACK-9544: Check access on account trying to generate user API keys
2016-10-27 21:55:16 +05:30
Marc-Aurèle Brothier ce02814901 CLOUDSTACK-9544: Check access on account trying to generate user API keys
This fixes CVE-2016-6813

Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-10-27 15:59:39 +05:30
Rohit Yadav 6f89892274 Merge pull request #22 from shapeblue/apple-base-9551
CLOUDSTACK-9551: Move java tmp dir to cloudstack-agent's path to avoid
2016-10-25 11:18:44 +05:30
Rohit Yadav 25cdb44c65 Merge pull request #21 from shapeblue/metrics-view-context-based-filtering
APPLE-309: Use context to filter items in a metrics view
2016-10-25 10:43:34 +05:30
Rohit Yadav 0841471cef Merge pull request #20 from shapeblue/roles-usage-fix
APPLE-274: Add role_id to cloud_usage.account
2016-10-25 10:42:00 +05:30
Rohit Yadav 860267fee0 Merge pull request #19 from shapeblue/kvm-host-without-storage
APPLE-272: Host Connects Without Storage
2016-10-25 10:41:42 +05:30
Abhinandan Prateek aa093659aa CLOUDSTACK-9551: Move java tmp dir to cloudstack-agent's path to avoid
noexec on /tmp
2016-10-20 12:25:21 +05:30
Rohit Yadav a5d6b55eb4 APPLE-309: Use context to filter items in a metrics view
Use available context to filter a metrics view based on zone, cluster, host
in the context object. This fixes metrics view filtering when metrics view is
viewed via Zone->Compute and Storage-> for a resource.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2016-10-19 11:41:08 +05:30