Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Makes role permissions orderable in UI/backend
- Role permissions evaluated by fixed order
- Rules draggable in UI
- Migration script adds a default order
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows root administrators to define new roles and associate API
permissions to them.
A limited form of role-based access control for the CloudStack management server
API is provided through a properties file, commands.properties, embedded in the
WAR distribution. Therefore, customizing API permissions requires unpacking the
distribution and modifying this file consistently on all servers. The old system
also does not permit the specification of additional roles.
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dynamic+Role+Based+API+Access+Checker+for+CloudStack
DB-Backed Dynamic Role Based API Access Checker for CloudStack brings following
changes, features and use-cases:
- Moves the API access definitions from commands.properties to the mgmt server DB
- Allows defining custom roles (such as a read-only ROOT admin) beyond the
current set of four (4) roles
- All roles will resolve to one of the four known roles types (Admin, Resource
Admin, Domain Admin and User) which maintains this association by requiring
all new defined roles to specify a role type.
- Allows changes to roles and API permissions per role at runtime including additions or
removal of roles and/or modifications of permissions, without the need
of restarting management server(s)
Upgrade/installation notes:
- The feature will be enabled by default for new installations, existing
deployments will continue to use the older static role based api access checker
with an option to enable this feature
- During fresh installation or upgrade, the upgrade paths will add four default
roles based on the four default role types
- For ease of migration, at the time of upgrade commands.properties will be used
to add existing set of permissions to the default roles. cloud.account
will have a new role_id column which will be populated based on default roles
as well
Dynamic-roles migration tool: scripts/util/migrate-dynamicroles.py
- Allows admins to migrate to the dynamic role based checker at a future date
- Performs a harder one-way migrate and update
- Migrates rules from existing commands.properties file into db and deprecates it
- Enables an internal hidden switch to enable dynamic role based checker feature
Deprecate commands.properties
- Fixes apidocs and marvin to be independent of commands.properties usage
- Removes bundling of commands.properties in deb/rpm packaging
- Removes file references across codebase
Reviewed-by: John Burwell <john.burwell@shapeblue.com>
QA-by: Boris Stoyanov <boris.stoyanov@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Backported PR #678 to 4.5
This I changed:
``jasypt='/usr/share/cloudstack-common/lib/jasypt-1.9.0.jar'``
in master it is 1.9.2. I changed it to 1.9.0 here, to match the original script.
Original commit IDs:
2f858a7d08ee9b644e288a1e79f518
Signed-off-by: Remi Bergsma <github@remi.nl>
We did not verify if the packets leaving an Instance had the correct
source address.
Any IP packet not matching the Instance IP(s) will be dropped
(cherry picked from commit 3e3c11ffca)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
VLAN id 4095 is commonly used as a 'tag passthrough' in virtualization environments
(VMware, specifically). This vlan id is incompatible with Linux, but we can
allow the admin to manually configure the bridge if the same passthrough is
desired.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Router VMs don't have a chain rule with -def suffix, this fixes name and
properly removes VR vms not running on a host
- Before trying to remove dnats, filter empty/None elements from list
- destroy_ebtables_rules should check what kind of action is request to be
performed (-A for add or -D for removed) and execute based on that
- Before executing any command, log it for debugging purposes
- Method to cleanup bridge, may be used in future
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The SG python script depends on ebtables-save which is not available on Debian
based distros (Ubuntu and Debian for example). The commit uses /proc/modules
to find available bridge tables (one of nat, filter or broute) and then
find VMs that need to be removed. Further it uses set() to remove duplicate VMs
so we don't try to remove a VM's rules more than once leading to unwanted errors
in the log.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue of Security Groups not working in case of XenServer 6.5;
- Uses nethash ipset data-structure to store CIDRs (efficient than iphash and
avoids overflow errors in case users add /8 /4 ingress/egress cidrs)
- Support for ipset versions both on 6.2 and 6.5, both have different outputs. This
fixes the issue of destroy_network_rules_for_vm failing
- Implements defensive filtering of list, instead of popping last item without
checking if it's None or empty
- Greps using names that are 'quoted' to avoid bash errors
- Before setting up new network rule, tries to clean and remove old ipset entry
- Idents, whitespace and naming fixes
PS. This is my 1000th commit to the 🐵 project :)
This closes#186
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This is a defensive enhancement for KVM SG script that filters out empty string
instead of popping last item which may or may not be an empty string.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The recent discussed improvement has the risk that if 'sync' hangs, the reboot may be delayed in the same way as the 'reboot' command would do. To work around, we're adding a 5 second timeout. If it cannot sync in 5 seconds, it will not succeed anyway and we should proceed the reset.
@snuf: Could we use your OVM3 heartbeat script for other hypervisors as well? One way to do it seems like a nice idea :-)
As discussed with @wido @pyr and @nuxro added an extra log line.
Tested it and it logs fine (tested to local disk) when syncing first:
Apr 3 15:31:23 mcctest2 heartbeat: kvmheartbeat.sh system because it was unable to write the heartbeat to the storage
By the way, it did also log to the agent.log but this extra log has the benefit of ending up in the system log so you'll probably find it easier there. Existing logs:
2015-04-03 15:27:23,943 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 0
2015-04-03 15:28:23,944 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 1
2015-04-03 15:29:23,946 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 2
2015-04-03 15:30:23,948 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 3
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 4
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout; reboot the host
This closes#145
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When storage cannot be reached, it does not make sense to reboot as it will try to flush buffers, umount NFS mounts, etc. This will not work and thus cause a long delay. With this change, the box will reboot immediately (like pressing the reset button).
When there is large size VR configuration (aggregate commands) copying data to VR using vmops plugin was failed
because of the ARG_MAX size limitation. The configuration data size is around 300KB.
Updated this to create file in host by scp with file contents. This will create file in host.
Then copy the file from the host to VR using hte vmops createFileInDomr method.
In host file get created in /tmp/ with name VR-<UUID>.cfg, once it copied to VR this file will be removed.
(cherry picked from commit 619f014255)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This makes sure dom0 in xenserver doesn't get hammered
when copying templates. It doesn't make sense to use
the cache of dom0 as the template does not fit in
memory. The directIO flags prevent it from trying.
(cherry picked from commit 4e1527e87a)
Recent versions of libvirt (at least 0.9.8) will return an int when
queried for the ID of a domain, not a string. This breaks some parts of
the `security_group.py` script which expects a string containing an
int. Notably, this breaks the part handling VM reboots which is
therefore not executed.
Signed-off-by: Vincent Bernat <Vincent.Bernat@exoscale.ch>
Signed-off-by: Sebastien Goasguen <runseb@gmail.com>