This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.
Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.
Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.
The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:
listen_tcp=0
listen_tls=1
tcp_port="16509"
tls_port="16514"
auth_tcp="none"
auth_tls="none"
key_file = "/etc/pki/libvirt/private/serverkey.pem"
cert_file = "/etc/pki/libvirt/servercert.pem"
ca_file = "/etc/pki/CA/cacert.pem"
For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.
There are no API or DB changes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Several fixes addressed:
- Dettach ISO fails when trying to detach a direct download ISO
- Fix for metalink support on SSVM agents (this closes CLOUDSTACK-10238)
- Reinstall VM from bypassed registered template (this closes CLOUDSTACK-10250)
- Fix upload certificate error message even though operation was successful
- Fix metalink download, checksum retry logic and metalink SSVM downloader
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.
This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).
This FR introduces two new global settings:
- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
for the agent's background task that checks and switches to agent's
preferred host.
The indirect.agent.lb.algorithm supports following algorithm options:
- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).
Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.
Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
'indirect.agent.lb.algorithm' global settings.
On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.
First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.
From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.
Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.
Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
During deletion of role, set name to null. This fixes concurrent
exception issue where previously it would rename the deleted role
with a timestamp.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes regression failures seen in Trillian, fixes NPEs that cause Travis related failures.
This also removes the aria2 dependency from rpms that require users to enable/install epel-release.
This finally updates the checksums for 4.11 systemvmtemplates in db upgrade path.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
Problem:
When a VM is deployed with in an Affinity group which has the cluster dedicated to a subdomain (zone is dedicated to parent domain) it is getting successful. We can also stop the vm and remove the affinity group, but if you want to add back the affinity it is failing.
Root cause:
During VM deployment there is clear check on affinity type (account/domain). Here since the acl_type is "domain" it does not expect to be same owner for entities.
But during update of affinity to VM there is no specific check for acl_type "domain".
Solution:
Fix is to make the access check similar to VM deployment where it does not expect to be same owner for entities if the acl_type is "domain".
- Migrate to embedded Jetty server.
- Improve ServerDaemon implementation.
- Introduce a new server.properties file for easier configuration.
- Have a single /etc/default/cloudstack-management to configure env.
- Reduce shaded jar file, removing unnecessary dependencies.
- Upgrade to Spring 5.x, upgrade several jar dependencies.
- Does not shade and include mysql-connector, used from classpath instead.
- Upgrade and use bountcastle as a separate un-shaded jar dependency.
- Remove tomcat related configuration and files.
- Have both embedded UI assets in uber jar and separate webapp directory.
- Refactor systemd and init scripts, cleanup packaging.
- Made cloudstack-setup-databases faster, using `urandom`.
- Remove unmaintained distro packagings.
- Moves creation and usage of server keystore in CA manager, this
deprecates the need to create/store cloud.jks in conf folder and
the db.cloud.keyStorePassphrase in db.properties file. This also
remove the need of the --keystore-passphrase in the
cloudstack-setup-encryption script.
- GZip contents dynamically in embedded Jetty
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-Authored-By: Prashanth Manthena <prashanth.manthena@nuagenetworks.net>
Co-Authored-By: Sigert Goeminne <sigert.goeminne@nuagenetworks.net>
Bug: https://issues.apache.org/jira/browse/CLOUDSTACK-9832
Detail:
When the VPC offering does not contain VpcVirtualRouter as a SourceNat provider,
then we will not add the interface in the public network to the VpcVR.
CLOUDSTACK-9832: Move isSrcNat check to VpcManager
- All tests should pass on KVM, Simulator
- Add test cases covering FSM state transitions and actions
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Removed three bg thread tasks, uses FSM event-trigger based scheduling
- On successful recovery, kicks VM HA
- Improves overall HA scheduling and task submission, lower DB access
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Misc changes:
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes issue of enabling dynamic roles based on the global setting
only. This also fixes application of the default role/permissions mapping
on upgrade from 4.8 and previous versions to 4.9+.
Previously, it would make additional check to ensure commands.properties
is not in the classpath however this creates confusion for admins who
may skip/skim through the rn/docs and assume that mere changing the
global settings was not enough.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows changing permission for existing role permissions, as those were static and could not be changed once created. It also provides the ability to change these permissions in the UI using a drop down menu for each permission rule, in which admin can select ‘Allow’ or ‘Deny’ permission.
Changes in the API:
This feature modifies behaviour of updateRolePermission API method:
New optional parameters ‘ruleid’ and ‘permission’ are introduced, they are mutual exclusive to ‘ruleorder’ parameter. This defines two use cases:
Update role permission: ‘ruleid’ and ‘permission’ parameters needed
Update rules order: ‘ruleorder’ parameter needed
Parameter ‘ruleorder’ is now optional
updateRolePermission providing ‘ruleorder’ parameter should be sent via POST
CloudStack has several background polling tasks that are spread across
the codebase, the aim of this work is to provide a single manager to
handle submission, execution and handling of background tasks. With
the framework implemented, existing oobm background task has been
refactored to use this manager.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When dedicating a resource (cluster or host) to a domain, the affinity
group which is created is visible to everyone rather than only to domain
that the cluster is dedicated to.
- Upgrades Maven dependency version to v1.55
- Fixes bountycastle usages and issues
- Adds timeout to jetty/annotation scanning
- Fixes servlet issue, uses servlet 3.1.0
- Downgrade javassist used by reflections to fix annotation process errors
- Make console-proxy-rdp bc dependency same as rest of the codebase
- Picks up PR #1510 by Daan
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Simplifies change password transactional logic without using pessmistic locks
- Adds a re-enter password field in the UI to valid ipmi/oobm password
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-9180: Optimize concurrent VM deployment operation on same network
Check if VR needs to be allocated for a given network and only acquire lock if required
Refer to the bug for details.
* pr/1251:
CLOUDSTACK-9180: Optimize concurrent VM deployment operation on same network Check if VR needs to be allocated for a given network and only acquire lock if required
Signed-off-by: Will Stevens <williamstevens@gmail.com>
CLOUDSTACK-9299: Out-of-band Management for CloudStackSupport access to a hosts out-of-band management interface (e.g. IPMI, iLO,
DRAC, etc.) to manage host power operations (on/off etc.) and querying current
power state in CloudStack.
Given the wide range of out-of-band management interfaces such as iLO and iDRA,
the service implementation allows for development of separate drivers as plugins.
This feature comes with a ipmitool based driver that uses the
ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any
out-of-band management interface that support IPMI 2.0.
This feature allows following common use-cases:
- Restarting stalled/failed hosts
- Powering off under-utilised hosts
- Powering on hosts for provisioning or to increase capacity
- Allowing system administrators to see the current power state of the host
For testing this feature, please install `ipmitool` (using yum/apt/brew) and `ipmisim`:
https://pypi.python.org/pypi/ipmisim
The default ipmitool location is assumed in /usr/bin, if this is different in your env please fix the global setting, see FS for details on various global settings.
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack
/cc @jburwell @swill @abhinandanprateek @murali-reddy @borisstoyanov
* pr/1502:
maven: ignore utils/testsmallfileinactive for rat checking
CLOUDSTACK-9378: Fix for #1497
HypervisorUtilsTest: increate timeout to 8seconds
travis: Use patched version of ipmitool for tests
CLOUDSTACK-9299: Out-of-band Management for CloudStack
Signed-off-by: Will Stevens <williamstevens@gmail.com>
* 4.8:
CLOUDSTACK-9287 - Improve test by checking if pvt gw is removed and fix typos
Handle private gateways more reliably
CLOUDSTACK-9287 - Fix RVR public interface
CLOUDSTACK-9287 - Add integration test to cover the private gateway related changes
CLOUDSTACK-9287 - Refactor the interface state configuration
CLOUDSTACK-9287 - Check if the nic profile has already been removed from a certain router
CLOUDSTACK-9287 - Bring up the private gw interface on state change to master
CLOUDSTACK-9287 - Make sure private gw interface is not used for default gw
CLOUDSTACK-9287 - Add integration test to cover the private gw interface/mac address issues
CLOUDSTACK-9287 - Put private gateway interface down on backup router
CLOUDSTACK-9287 - Generate new mac address if router is redundant and nic profile exists
Add private gateway IP to router initialization config
apply static routes on change to master state
Support access to a host’s out-of-band management interface (e.g. IPMI, iLO,
DRAC, etc.) to manage host power operations (on/off etc.) and querying current
power state in CloudStack.
Given the wide range of out-of-band management interfaces such as iLO and iDRA,
the service implementation allows for development of separate drivers as plugins.
This feature comes with a ipmitool based driver that uses the
ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any
out-of-band management interface that support IPMI 2.0.
This feature allows following common use-cases:
- Restarting stalled/failed hosts
- Powering off under-utilised hosts
- Powering on hosts for provisioning or to increase capacity
- Allowing system administrators to see the current power state of the host
For testing this feature `ipmisim` can be used:
https://pypi.python.org/pypi/ipmisim
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows root administrators to define new roles and associate API
permissions to them.
A limited form of role-based access control for the CloudStack management server
API is provided through a properties file, commands.properties, embedded in the
WAR distribution. Therefore, customizing API permissions requires unpacking the
distribution and modifying this file consistently on all servers. The old system
also does not permit the specification of additional roles.
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Dynamic+Role+Based+API+Access+Checker+for+CloudStack
DB-Backed Dynamic Role Based API Access Checker for CloudStack brings following
changes, features and use-cases:
- Moves the API access definitions from commands.properties to the mgmt server DB
- Allows defining custom roles (such as a read-only ROOT admin) beyond the
current set of four (4) roles
- All roles will resolve to one of the four known roles types (Admin, Resource
Admin, Domain Admin and User) which maintains this association by requiring
all new defined roles to specify a role type.
- Allows changes to roles and API permissions per role at runtime including additions or
removal of roles and/or modifications of permissions, without the need
of restarting management server(s)
Upgrade/installation notes:
- The feature will be enabled by default for new installations, existing
deployments will continue to use the older static role based api access checker
with an option to enable this feature
- During fresh installation or upgrade, the upgrade paths will add four default
roles based on the four default role types
- For ease of migration, at the time of upgrade commands.properties will be used
to add existing set of permissions to the default roles. cloud.account
will have a new role_id column which will be populated based on default roles
as well
Dynamic-roles migration tool: scripts/util/migrate-dynamicroles.py
- Allows admins to migrate to the dynamic role based checker at a future date
- Performs a harder one-way migrate and update
- Migrates rules from existing commands.properties file into db and deprecates it
- Enables an internal hidden switch to enable dynamic role based checker feature
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- In case of redundant VPCs, the ACL items are revoked in the first iteration. Since the econd iteration
is needed in order to remove the private network, we have to check if the nic profile is gone before trying
to revoke the ACL items again, which would throw a NPE.
- Some variable extraction in order to ease debugging.
* 4.6:
CLOUDSTACK-9106 - Makes Enum name compliant with Java code conventions.
CLOUDSTACK-9106 - Adds a test to cover the changes in the applyVpnUsers() method
CLOUDSTACK-9106 - Makes the router commands call more consistent.
CLOUDSTACK-9106 - Enables private gateway tests on Redundant VPCs
CLOUDSTACK-9106 - Refactor the createPrivateNicProfileForGateway() method
CLOUDSTACK-9106 - Reduces the amount of iterations through the routers of a VPC
Add support for not (re)starting server after cloud-setup-management.
Closed PRs that will not be considered for merge:
This closes#1158
This closes#1097
- Changed the NetworkTopologyContext class just to make the private member accessible from the test
- Added a test class to cover the positive scenario of the VpcVirtualRouterElementTest.applyVpnUsers() method.
- Covering when there is either no VPC or no routers.
- It was causing problems because Nics were expected to be plugged before they actually exist. Only in rVPC cases.
- Applies ACL items to routers only after the Pvt GW is setup.
* 4.6:
CLOUDSTACK-9075 - Uses the same vlan since it should have been already released
CLOUDSTACK-9075 - Adds VPC static routes test
CLOUDSTACK-9075 - Covers Private GW ACL with Redundant VPCs
CLOUDSTACK-9075 - Add method to get list of Physical Networks per zone
CLOUDSTACK-6276 Removing unused parameter in integration test for projects
CLOUDSTACK-6276 Removing unused parameter in integration test
CLOUDSTACK-6276 Fixing affinity groups for projects