- Improves job scheduling using state/event-driven logic
- Reduced database and cpu load, by reducing all background threads to one
- Improves Simulator and KVM host-ha integration tests
- Triggers VM HA on successful host (ipmi reboot) recovery
- Improves internal datastructures and checks around HA counter
- New FSM events to retry fencing and recovery
- Fixes KVM activity script to aggresively check against last update time
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Makes role permissions orderable in UI/backend
- Role permissions evaluated by fixed order
- Rules draggable in UI
- Migration script adds a default order
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows root administrators to define new roles and associate API
permissions to them.
A limited form of role-based access control for the CloudStack management server
API is provided through a properties file, commands.properties, embedded in the
WAR distribution. Therefore, customizing API permissions requires unpacking the
distribution and modifying this file consistently on all servers. The old system
also does not permit the specification of additional roles.
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dynamic+Role+Based+API+Access+Checker+for+CloudStack
DB-Backed Dynamic Role Based API Access Checker for CloudStack brings following
changes, features and use-cases:
- Moves the API access definitions from commands.properties to the mgmt server DB
- Allows defining custom roles (such as a read-only ROOT admin) beyond the
current set of four (4) roles
- All roles will resolve to one of the four known roles types (Admin, Resource
Admin, Domain Admin and User) which maintains this association by requiring
all new defined roles to specify a role type.
- Allows changes to roles and API permissions per role at runtime including additions or
removal of roles and/or modifications of permissions, without the need
of restarting management server(s)
Upgrade/installation notes:
- The feature will be enabled by default for new installations, existing
deployments will continue to use the older static role based api access checker
with an option to enable this feature
- During fresh installation or upgrade, the upgrade paths will add four default
roles based on the four default role types
- For ease of migration, at the time of upgrade commands.properties will be used
to add existing set of permissions to the default roles. cloud.account
will have a new role_id column which will be populated based on default roles
as well
Dynamic-roles migration tool: scripts/util/migrate-dynamicroles.py
- Allows admins to migrate to the dynamic role based checker at a future date
- Performs a harder one-way migrate and update
- Migrates rules from existing commands.properties file into db and deprecates it
- Enables an internal hidden switch to enable dynamic role based checker feature
Deprecate commands.properties
- Fixes apidocs and marvin to be independent of commands.properties usage
- Removes bundling of commands.properties in deb/rpm packaging
- Removes file references across codebase
Reviewed-by: John Burwell <john.burwell@shapeblue.com>
QA-by: Boris Stoyanov <boris.stoyanov@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Backported PR #678 to 4.5
This I changed:
``jasypt='/usr/share/cloudstack-common/lib/jasypt-1.9.0.jar'``
in master it is 1.9.2. I changed it to 1.9.0 here, to match the original script.
Original commit IDs:
2f858a7d08ee9b644e288a1e79f518
Signed-off-by: Remi Bergsma <github@remi.nl>
We did not verify if the packets leaving an Instance had the correct
source address.
Any IP packet not matching the Instance IP(s) will be dropped
(cherry picked from commit 3e3c11ffca)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
VLAN id 4095 is commonly used as a 'tag passthrough' in virtualization environments
(VMware, specifically). This vlan id is incompatible with Linux, but we can
allow the admin to manually configure the bridge if the same passthrough is
desired.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Router VMs don't have a chain rule with -def suffix, this fixes name and
properly removes VR vms not running on a host
- Before trying to remove dnats, filter empty/None elements from list
- destroy_ebtables_rules should check what kind of action is request to be
performed (-A for add or -D for removed) and execute based on that
- Before executing any command, log it for debugging purposes
- Method to cleanup bridge, may be used in future
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The SG python script depends on ebtables-save which is not available on Debian
based distros (Ubuntu and Debian for example). The commit uses /proc/modules
to find available bridge tables (one of nat, filter or broute) and then
find VMs that need to be removed. Further it uses set() to remove duplicate VMs
so we don't try to remove a VM's rules more than once leading to unwanted errors
in the log.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue of Security Groups not working in case of XenServer 6.5;
- Uses nethash ipset data-structure to store CIDRs (efficient than iphash and
avoids overflow errors in case users add /8 /4 ingress/egress cidrs)
- Support for ipset versions both on 6.2 and 6.5, both have different outputs. This
fixes the issue of destroy_network_rules_for_vm failing
- Implements defensive filtering of list, instead of popping last item without
checking if it's None or empty
- Greps using names that are 'quoted' to avoid bash errors
- Before setting up new network rule, tries to clean and remove old ipset entry
- Idents, whitespace and naming fixes
PS. This is my 1000th commit to the 🐵 project :)
This closes#186
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This is a defensive enhancement for KVM SG script that filters out empty string
instead of popping last item which may or may not be an empty string.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The recent discussed improvement has the risk that if 'sync' hangs, the reboot may be delayed in the same way as the 'reboot' command would do. To work around, we're adding a 5 second timeout. If it cannot sync in 5 seconds, it will not succeed anyway and we should proceed the reset.
@snuf: Could we use your OVM3 heartbeat script for other hypervisors as well? One way to do it seems like a nice idea :-)
As discussed with @wido @pyr and @nuxro added an extra log line.
Tested it and it logs fine (tested to local disk) when syncing first:
Apr 3 15:31:23 mcctest2 heartbeat: kvmheartbeat.sh system because it was unable to write the heartbeat to the storage
By the way, it did also log to the agent.log but this extra log has the benefit of ending up in the system log so you'll probably find it easier there. Existing logs:
2015-04-03 15:27:23,943 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 0
2015-04-03 15:28:23,944 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 1
2015-04-03 15:29:23,946 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 2
2015-04-03 15:30:23,948 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 3
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 4
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout; reboot the host
This closes#145
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When storage cannot be reached, it does not make sense to reboot as it will try to flush buffers, umount NFS mounts, etc. This will not work and thus cause a long delay. With this change, the box will reboot immediately (like pressing the reset button).
When there is large size VR configuration (aggregate commands) copying data to VR using vmops plugin was failed
because of the ARG_MAX size limitation. The configuration data size is around 300KB.
Updated this to create file in host by scp with file contents. This will create file in host.
Then copy the file from the host to VR using hte vmops createFileInDomr method.
In host file get created in /tmp/ with name VR-<UUID>.cfg, once it copied to VR this file will be removed.
(cherry picked from commit 619f014255)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This makes sure dom0 in xenserver doesn't get hammered
when copying templates. It doesn't make sense to use
the cache of dom0 as the template does not fit in
memory. The directIO flags prevent it from trying.
(cherry picked from commit 4e1527e87a)
Recent versions of libvirt (at least 0.9.8) will return an int when
queried for the ID of a domain, not a string. This breaks some parts of
the `security_group.py` script which expects a string containing an
int. Notably, this breaks the part handling VM reboots which is
therefore not executed.
Signed-off-by: Vincent Bernat <Vincent.Bernat@exoscale.ch>
Signed-off-by: Sebastien Goasguen <runseb@gmail.com>