* This PR/commit comprises of the following:
- Support to fallback on the older systemVM template in case of no change in template across ACS versions
- Update core user to cloud in CKS
- Display details of accessing CKS nodes in the UI - K8s Access tab
- Update systemvm template from debian 11 to debian 11.2
- Update letsencrypt cert
- Remove docker dependency as from ACS 4.16 onward k8s has deprecated support for docker - use containerd as container runtime
* support for private registry - containerd
* Enable updating template type (only) for system owned templates via UI
* edit indents
* Address comments and move cmd from patch file to cloud-init runcmd
* temporary change
* update k8s test to use k8s version 1.21.5 (instead of 1.21.3 - due to https://github.com/kubernetes/kubernetes/pull/104530)
* support for private registry - containerd
* Enable updating template type (only) for system owned templates via UI
* smooth upgrade of cks clusters
* update pom file with temp download.cloudstack.org testing links
* fix pom
* add cgroup config for containerd
* add systemd config for kubelet
* add additional info during image registry config
* update to official links
* Externalize KVM Agent storage's timeout configuration
* Address @nvazquez review
* Add empty line at the end of the agent.properties file
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* Externalize KVM Agent storage's timeout configuration
Created a class of constant agent's properties available to configure on "agent.properties".
Created a class to provides a facility to read the agent's properties file and get its properties.
* Refactored KVHAMonitor nested thread and changed some logs
* It has been added the timeout's config in the agent.properties file
* Rename classes
* Rename var and remove comment
* Fix typo with word "heartbeat"
* Extract multiple methods call to variables
* Add unit tests to file handler
* Increase info about the property
* Create inner class Property
* Rename method getProperty to getPropertyValue
* Remove copyright
* Remove copyright
* Extract code to createHeartBeatCommand
* Change method access from protected to private
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* novnc: Add client IP check for novnc console in cloudstack 4.16
* novnc ip check : Fix restart CPVM or mgt server does not update novnc param
* novnc ip check: move to method
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
Currently CloudStack is using logging frameworks as log4j and Java util logging, logging wrappers as slf4j and Apache common logging.
Here changes are to made it uniform, using only log4j framework.
Removed Java util logging, slf4j and Apache common logging.
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
This adds support for JDK11 in CloudStack 4.14+:
- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Improvements on upload direct download certificates
* Move upload direct download certificate logic to KVM plugin
* Extend unit test certificate expiration days
* Add marvin tests and command to revoke certificates
* Review comments
* Do not include revoke certificates API
* Keep connection alive when on maintenance
* Refactor cancel maintenance and unit tests
* Add marvin tests
* Refactor
* Changing the way we get ssh credentials
* Add check on SSH restart and improve marvin tests
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).
This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue that on reconnection, agent LB feature will fail
and only the first ms-host will be tried reconnection again and again.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In some environments running the keystore cert renewal (as root user)
over an already connected agent connection may cause exception
such as: `sudo: sorry, you must have a tty to run sudo`. Since, all
agents - KVM, CPVM and SSVM run as root user, we don't need to run
the renewal scripts with sudo.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent loses connection with management server, the reconnection
logic waits for any pending tasks to finish. However, when such tasks
do finish they fail to send an `Answer` back to managements server.
Therefore from a management server's perspective such pending
operations are stuck in a FSM state and need manual removal or fixing.
This is by design where management server's side cmd-answer request
pattern is code/execution dependent, therefore even if the answer
were to be sent when management server came back up (reconnects)
the management server will fail to acknowledge and process the answer
due to missing listeners or being in the exact state to handle answers.
Historically, the Agent would wait to reconnect until the internal
tasks complete but I found no reason why it should wait for reconnection
at all.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.
Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.
Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.
The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:
listen_tcp=0
listen_tls=1
tcp_port="16509"
tls_port="16514"
auth_tcp="none"
auth_tls="none"
key_file = "/etc/pki/libvirt/private/serverkey.pem"
cert_file = "/etc/pki/libvirt/servercert.pem"
ca_file = "/etc/pki/CA/cacert.pem"
For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.
There are no API or DB changes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Several fixes addressed:
- Dettach ISO fails when trying to detach a direct download ISO
- Fix for metalink support on SSVM agents (this closes CLOUDSTACK-10238)
- Reinstall VM from bypassed registered template (this closes CLOUDSTACK-10250)
- Fix upload certificate error message even though operation was successful
- Fix metalink download, checksum retry logic and metalink SSVM downloader
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.
This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).
This FR introduces two new global settings:
- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
for the agent's background task that checks and switches to agent's
preferred host.
The indirect.agent.lb.algorithm supports following algorithm options:
- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).
Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.
Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
'indirect.agent.lb.algorithm' global settings.
On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.
First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.
From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.
Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.
Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This would make keystore utility scripts being executed as sudoer
in case the process uid/owner is not root but still a sudoer user.
Also fails addHost while securing a KVM host and if keystore fails to be
setup for any reason.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Failure on HTTPS downloader for Direct Download templates on KVM.
Reason: Incorrect request caused NullPointerException getting the response InputStream
CLOUDSTACK-10269: On deletion of role set name to null (#2444)
CLOUDSTACK-10146 checksum in java instead of script (#2405)
CLOUDSTACK-10222: Clean snaphosts from primary storage when taking (#2398)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes move refactoring error introduced in #2283
For instance, the class DatadiskTO is supposed to be in com.cloud.agent.api.to package. However, the folder structure it was placed in is com.cloud.agent.api.api.to.
Skip tests for cloud-plugin-hypervisor-ovm3:
For some unknown reason, there are quite a lot of broken test cases for cloud-plugin-hypervisor-ovm3. They might have appeared after some dependency upgrade and was overlooked by the person updating them. I checked them to see if they could be fixed, but these tests are not developed in a clear and clean manner. On top of that, we do not see (at least I) people using OVM3-hypervisor with ACS. Therefore, I decided to skip them.
Identention corrected to use spaces instead of tabs in XML files
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.
- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml
Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
* Cleanup and Improve NetUtils
This class had many unused methods, inconsistent names and redundant code.
This commit cleans up code, renames a few methods and constants.
The global/account setting 'api.allowed.source.cidr.list' is set
to 0.0.0.0/0,::/0 by default preserve the current behavior and thus
allow API calls for accounts from all IPv4 and IPv6 subnets.
Users can set it to a comma-separated list of IPv4/IPv6 subnets to
restrict API calls for Admin accounts to certain parts of their network(s).
This is to improve Security. Should an attacker steal the Access/Secret key
of an account he/she still needs to be in a subnet from where accounts are
allowed to perform API calls.
This is a good security measure for APIs which are connected to the public internet.
Signed-off-by: Wido den Hollander <wido@widodh.nl>