This issue occurs only with KVM hypervisor. Database entries for templates created from snapshots disappear after management-server service restart
# STEPS TO REPRODUCE
Create a ACS setup and add KVM hypervisor as host.
Create snapshot of any disk (root or data disk) of an instance.
Create template using disk snapshot.
Verify that template got downloaded completely and is in Ready state.
Also, verify that entry for this template is present in template_store_ref table in database.
Now restart management server.
Once management server is restarted completely and web UI is available, check the template status. It will be in Active state instead of downloaded.
Also, entry for this template vanishes from template_store_ref table in database.
# Fix for the Issue
In NfsSecondaryStorageResource.java class, inside method copySnapshotToTemplateFromNfsToNfs() bufferwriter which was created for writing data in template.properties file is not closed and hence few properties were not getting written in template.properties. As few properties were absent in template.properties file, so after management server restart, this template is not loaded and hence it goes into Active state.
- All tests should pass on KVM, Simulator
- Add test cases covering FSM state transitions and actions
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Removed three bg thread tasks, uses FSM event-trigger based scheduling
- On successful recovery, kicks VM HA
- Improves overall HA scheduling and task submission, lower DB access
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Misc changes:
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Summary: this commit alters column currency_value from table
cloud_usage.quota_tariff to support values up to 5 decimal places. The
current implementation allows up to 2 decimal places.
Issue: need to use more than 2 decimal places to define resources values
in Quota tariff.
Solution: modify column currency_value from table
cloud_usage.quota_tariff to support values up to 5 decimal places.
Values with more than 5 decimal places will be displayed with scientific
notation in the user interface.
SQL command: "ALTER TABLE cloud_usage.quota_tariff MODIFY currency_value
DECIMAL(15,5) not null"
This fixes issue of enabling dynamic roles based on the global setting
only. This also fixes application of the default role/permissions mapping
on upgrade from 4.8 and previous versions to 4.9+.
Previously, it would make additional check to ensure commands.properties
is not in the classpath however this creates confusion for admins who
may skip/skim through the rn/docs and assume that mere changing the
global settings was not enough.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
If a public IP is assigned to a VPC, a VM running inside that VPC cannot ping that public IP. This is due to the IPtables Nat rules set in such a way that drop any requests to the public IP from internal interfaces. I am fixing this so that internal hosts can also reach the public IP.
Reproduction:
Create a VPC
Create a network inside the VPC
Allocate a public IP
Create a VM in the network
Create a port forwarding rule enabling ICMP
ping the public IP inside the VM (this will fail)
Change default configuration for router.aggregation.command.each.timeout from 3 to 600 seconds (#2223)
(cherry picked from commit 17bc6afc82)
This fixes some test_nic failures caused due to short aggregation command timeout
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Change default configuration for router.aggregation.command.each.timeout from 3 to 600 seconds (#2223)
(cherry picked from commit 17bc6afc82)
This fixes some test_nic failures caused due to short aggregation command timeout
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows changing permission for existing role permissions, as those were static and could not be changed once created. It also provides the ability to change these permissions in the UI using a drop down menu for each permission rule, in which admin can select ‘Allow’ or ‘Deny’ permission.
Changes in the API:
This feature modifies behaviour of updateRolePermission API method:
New optional parameters ‘ruleid’ and ‘permission’ are introduced, they are mutual exclusive to ‘ruleorder’ parameter. This defines two use cases:
Update role permission: ‘ruleid’ and ‘permission’ parameters needed
Update rules order: ‘ruleorder’ parameter needed
Parameter ‘ruleorder’ is now optional
updateRolePermission providing ‘ruleorder’ parameter should be sent via POST