This PR creates a new API createConsoleAccess to create VM console URL allowing it to connect using other UI implementations. To avoid reply attacks, the console access is enhanced to use a one time token per session
New configuration added:
consoleproxy.extra.security.validation.enabled: Enable/disable extra security validation for console proxy using a token
Documentation PR: apache/cloudstack-documentation#284
* agent: enable ssl only for kvm agent (not in system vms)
* Revert "agent: enable ssl only for kvm agent (not in system vms)"
This reverts commit b2d76bad2e.
* Revert "KVM: Enable SSL if keystore exists (#6200)"
This reverts commit 4525f8c8e7.
* KVM: Enable SSL if keystore exists in LibvirtComputingResource.java
* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency
* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp
* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup
* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes
* Add ssh to k8s nodes details in the Access tab on the UI
* test
* Refactor ca/cert patching logic
* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script
* remove all references of systemvm.iso
* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs
* fix script timeout
* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand
* remove commented code + change core user to cloud for cks nodes
* Update ownership of ssh directory
* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)
* Add UI changes + move changes from patch file to runcmd
* test: validate performance for template modification during seeding
* create vms folder in cloudstack-commons directory - debian rules
* remove logic for on the fly template convert + update k8s test
* fix syntax issue - causing issue with shared network tests
* Code cleanup
* refactor patching logic - certs
* move logic of fixing rootdiskcontroller from upgrade to kubernetes service
* add livepatch option to restart network & vpc
* smooth upgrade of cks clusters
* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency
* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp
* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup
* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes
* Add ssh to k8s nodes details in the Access tab on the UI
* test
* Refactor ca/cert patching logic
* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script
* remove all references of systemvm.iso
* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs
* fix script timeout
* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand
* remove commented code + change core user to cloud for cks nodes
* Update ownership of ssh directory
* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)
* Add UI changes + move changes from patch file to runcmd
* test: validate performance for template modification during seeding
* create vms folder in cloudstack-commons directory - debian rules
* remove logic for on the fly template convert + update k8s test
* fix syntax issue - causing issue with shared network tests
* Code cleanup
* add cgroup config for containerd
* add systemd config for kubelet
* add additional info during image registry config
* address comments
* add temp links of download.cloudstack.org
* address part of the comments
* address comments
* update containerd config - as version has upgraded to 1.5 from 1.4.12 in 4.17.0
* address comments - simplify
* fix vue3 related icon changes
* allow network commands when router template version is lower but is patched
* add internal LB to the list of routers to be patched on network restart with live patch
* add unit tests for API param validations and new helper utilities - file scp & checksum validations
* perform patching only for non-user i.e., system VMs
* add test to validate params
* remove unused import
* add column to domain_router to display software version and support networkrestart with livePatch from router view
* Requires upgrade column to consider package (cloud-scripts) checksum to identify if true/false
* use router software version instead of checksum
* show N/A if no software version reported i.e., in upgraded envs
* fix deb failure
* update pom to official links of systemVM template
When using advanced virtualization the IO Driver is not supported. The
admin will decide if want to enable/disable this configuration from
agent.properties file. The default value is true
* This PR/commit comprises of the following:
- Support to fallback on the older systemVM template in case of no change in template across ACS versions
- Update core user to cloud in CKS
- Display details of accessing CKS nodes in the UI - K8s Access tab
- Update systemvm template from debian 11 to debian 11.2
- Update letsencrypt cert
- Remove docker dependency as from ACS 4.16 onward k8s has deprecated support for docker - use containerd as container runtime
* support for private registry - containerd
* Enable updating template type (only) for system owned templates via UI
* edit indents
* Address comments and move cmd from patch file to cloud-init runcmd
* temporary change
* update k8s test to use k8s version 1.21.5 (instead of 1.21.3 - due to https://github.com/kubernetes/kubernetes/pull/104530)
* support for private registry - containerd
* Enable updating template type (only) for system owned templates via UI
* smooth upgrade of cks clusters
* update pom file with temp download.cloudstack.org testing links
* fix pom
* add cgroup config for containerd
* add systemd config for kubelet
* add additional info during image registry config
* update to official links
* Externalize KVM Agent storage's timeout configuration
* Address @nvazquez review
* Add empty line at the end of the agent.properties file
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* Externalize KVM Agent storage's timeout configuration
Created a class of constant agent's properties available to configure on "agent.properties".
Created a class to provides a facility to read the agent's properties file and get its properties.
* Refactored KVHAMonitor nested thread and changed some logs
* It has been added the timeout's config in the agent.properties file
* Rename classes
* Rename var and remove comment
* Fix typo with word "heartbeat"
* Extract multiple methods call to variables
* Add unit tests to file handler
* Increase info about the property
* Create inner class Property
* Rename method getProperty to getPropertyValue
* Remove copyright
* Remove copyright
* Extract code to createHeartBeatCommand
* Change method access from protected to private
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* novnc: Add client IP check for novnc console in cloudstack 4.16
* novnc ip check : Fix restart CPVM or mgt server does not update novnc param
* novnc ip check: move to method
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ
Documentation PR: apache/cloudstack-documentation#169
This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack
Other improvements addressed in addition to PowerFlex/ScaleIO support:
- Added support for config drives in host cache for KVM
=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
- Updated full deployment destination for preparing the network(s) on VM start
- Propagate the direct download certificates uploaded to the newly added KVM hosts
- Discover the template size for direct download templates using any available host from the zones specified on template registration
=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
- Retry VM deployment/start when the host cannot grant access to volume/template
- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
- Check the router filesystem is writable or not, before performing health checks
=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)
* Addressed some issues for few operations on PowerFlex storage pool.
- Updated migration volume operation to sync the status and wait for migration to complete.
- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.
- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).
- Updated resize volume error message string.
- Blocked the below operations on PowerFlex storage pool:
-> Extract Volume
-> Create Snapshot for VMSnapshot
* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.
- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html
Other fixes included:
- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)
- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.
- Perform basic tests (for connectivity and file system) on router before updating the health check config data
=> Validate the basic tests (connectivity and file system check) on router
=> Cleanup the health check results when router is destroyed
* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0
* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool
and Minor improvements in PowerFlex/ScaleIO storage plugin code
* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.
- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.
* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py
* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)
* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.
* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure
* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
Added property to agent.properties that enables or disables the iscsi session clean up feature. #4210
Added a condition to prevent disk partitions from being cleaned up. #4216
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
* DB : Add support for MySQL 8
- Splits commands to create user and grant access on database, the old
statement is no longer supported by MySQL 8.x
- `NO_AUTO_CREATE_USER` is no longer supported by MySQL 8.x so remove
that from db.properties conn parameters
For mysql-server 8.x setup the following changes were added/tested to
make it work with CloudStack in /etc/mysql/mysql.conf.d/mysqld.cnf and
then restart the mysql-server process:
server_id = 1
sql-mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_ENGINE_SUBSTITUTION"
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=1000
log-bin=mysql-bin
binlog-format = 'ROW'
default-authentication-plugin=mysql_native_password
Notice the last line above, this is to reset the old password based
authentication used by MySQL 5.x.
Developers can set empty password as follows:
> sudo mysql -u root
ALTER USER 'root'@'localhost' IDENTIFIED BY '';
In libvirt repository, there are two related commits
2019-08-23 13:13 Daniel P. Berrangé ● rpm: don't enable socket activation in upgrade if --listen present
2019-08-22 14:52 Daniel P. Berrangé ● remote: forbid the --listen arg when systemd socket activation
In libvirt.spec.in
/bin/systemctl mask libvirtd.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-ro.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-admin.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-tls.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-tcp.socket >/dev/null 2>&1 || :
Co-authored-by: Wei Zhou <w.zhou@global.leaseweb.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
Currently CloudStack is using logging frameworks as log4j and Java util logging, logging wrappers as slf4j and Apache common logging.
Here changes are to made it uniform, using only log4j framework.
Removed Java util logging, slf4j and Apache common logging.
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
This adds support for JDK11 in CloudStack 4.14+:
- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In VM migration on KVM, libvirt qemu hook script will change the bridge name to bridges for guest networks. It works for user vm. However for virtual router, it has nics on control network and public network. If control/public use different physical networks than guest network, virtual router cannot be migrated.
Fixes: #2783
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.
Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.
In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
KVM is supported on arm64 Linux (https://www.linux-kvm.org/page/Processor_support#ARM:).
For a small (IoT) platform such as the new Raspberry Pi 4 that uses armv8 processor
(cortex-a72) it's possible to run Linux host with `/dev/kvm`
accleration. This adds support for IoT IaaS in CloudStack.
This PR is from a fun weekend project where:
- I set up a Raspberry Pi 4 - 4GB RAM model with 4 CPU cores @ 1.5Ghz, 128GB SD samsung evo plus card
- Installed Ubuntu 19.10 raspi3 base image: http://cdimage.ubuntu.com/releases/19.10/release/ubuntu-19.10-preinstalled-server-arm64+raspi3.img.xz
- Build a custom Linux 5.3 kernel with KVM enabled, deb here: http://dl.rohityadav.cloud/cloudstack-rpi/kernel-19.10/ and install the linux-image and linux-module
- Then install/setup CloudStack on it (fix some issues around jna, by manually installing newer libjna-java to /usr/share/cloudstack-agent/lib)
- Since the host processor is not x86_64, I had to build a new arm64 (or aarch64) systemvmtemplate: http://dl.rohityadav.cloud/cloudstack-rpi/systemvmtemplate/
I could finally get a 4.13 CloudStack + Adv zone/networking to run on it
and deployed a KVM based Ubuntu 19.10 environment and NFS storage.
Deployed a test vm with isolated network, VR works as expected. Console
proxy works as well, for this tested against arm64 openstack Debian 9/10
templates.
I raised the issue of enabling KVM in upstream Ubuntu arm64 build: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1783961
Ubuntu kernel team has come back and future arm64 releases may have
KVM enabled by default.
Limitation: on my aarch64 env, it did not support IDE, therefore all
default bus type for volumes are SCSI by default. With VIRTIO it fails
sometimes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.
Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.
In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Using this tool on a hypervisor admins can query KVM Instances running
on that hypervisor if they have the Qemu Guest Agent installed.
All System VMs have this and they can be queried.
For example:
$ cloudstack-guest-tool i-2-25-VM
This will print some information about network and filesystem status.
root@hv-138-a05-23:~# ./cloudstack-guest-tool s-11-VM --command info|jq
{
"network": [
{
"ip-addresses": [
{
"prefix": 8,
"ip-address": "127.0.0.1",
"ip-address-type": "ipv4"
}
],
"name": "lo",
"hardware-address": "00:00:00:00:00:00"
},
{
"ip-addresses": [
{
"prefix": 16,
"ip-address": "169.254.242.169",
"ip-address-type": "ipv4"
}
],
"name": "eth0",
"hardware-address": "0e:00:a9:fe:f2:a9"
},
...
...
"filesystem": [
{
"mountpoint": "/var",
"disk": [
{
"bus": 0,
"bus-type": "virtio",
"target": 0,
"unit": 0,
"pci-controller": {
"slot": 7,
"bus": 0,
"domain": 0,
"function": 0
}
}
],
"type": "ext4",
"name": "vda6"
},
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This fixes the qemu hooks `mkdir` race condition which can happen when
too many VMs may launch on a KVM host executing the hooks script that
tries to `mkdir` for the custom directory. On exception (multiple scripts
trying to mkdir), the VM stops.
The custom directory need not be created if it does not exist, instead
the custom hooks should only execute when there is a custom directory.
Feature documentation:
http://docs.cloudstack.apache.org/en/4.11.2.0/adminguide/hosts.html#kvm-libvirt-hook-script-include
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Improvements on upload direct download certificates
* Move upload direct download certificate logic to KVM plugin
* Extend unit test certificate expiration days
* Add marvin tests and command to revoke certificates
* Review comments
* Do not include revoke certificates API
* Keep connection alive when on maintenance
* Refactor cancel maintenance and unit tests
* Add marvin tests
* Refactor
* Changing the way we get ssh credentials
* Add check on SSH restart and improve marvin tests
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).
This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue that on reconnection, agent LB feature will fail
and only the first ms-host will be tried reconnection again and again.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
KVM hook script include - logic to execute custom scripts & logging requirements
KVM hook script include - add logic to create custom directory if not exists & extra logging
* Cleaup and code-formatting POM files
* Remove obsolete mycila license-maven-plugin
* Remove obsolete console-proxy/plugin project
* Move console-proxy-rdbconsole under console-proxy parent
* Use correct parent path for rdpconsole
* Order alphabetally items in setnextversion.sh
* Unifiy License header in POMs
* Alphabetic order of modules definition
* Extract all defined versions into parent pom
* Remove obsolete files: version-info.in, configure-info.in
* Remove redundant defaultGoal
* Remove useless checkstyle plugin from checkstyle project
* Order alphabetally items in pom.xml
* Add aditional SPACEs to fix debian build
* Don't execute checkstyle on parent projects
* Use UTF-8 encoding in building checkstyle project
* Extract plugin versions into properties
* Execute PMD plugin on all the projects with -Penablefindbugs
* Upgrade maven plugins to latest version
* Make sure to always look for apache parent pom from repository
* Fix incorrect version grep in debian packaging
* Fix rebase conflicts
* Fix rebase conflicts
* Remove PMD for now to be fixed on another PR
Fixes the version in pom etc. to be consistent with versioning pattern as X.Y.Z.0-SNAPSHOT after a minor release.
Signed-off-by: Khosrow Moossavi <khos2ow@gmail.com>
In some environments running the keystore cert renewal (as root user)
over an already connected agent connection may cause exception
such as: `sudo: sorry, you must have a tty to run sudo`. Since, all
agents - KVM, CPVM and SSVM run as root user, we don't need to run
the renewal scripts with sudo.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent loses connection with management server, the reconnection
logic waits for any pending tasks to finish. However, when such tasks
do finish they fail to send an `Answer` back to managements server.
Therefore from a management server's perspective such pending
operations are stuck in a FSM state and need manual removal or fixing.
This is by design where management server's side cmd-answer request
pattern is code/execution dependent, therefore even if the answer
were to be sent when management server came back up (reconnects)
the management server will fail to acknowledge and process the answer
due to missing listeners or being in the exact state to handle answers.
Historically, the Agent would wait to reconnect until the internal
tasks complete but I found no reason why it should wait for reconnection
at all.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.
Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.
Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.
The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:
listen_tcp=0
listen_tls=1
tcp_port="16509"
tls_port="16514"
auth_tcp="none"
auth_tls="none"
key_file = "/etc/pki/libvirt/private/serverkey.pem"
cert_file = "/etc/pki/libvirt/servercert.pem"
ca_file = "/etc/pki/CA/cacert.pem"
For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.
There are no API or DB changes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Several fixes addressed:
- Dettach ISO fails when trying to detach a direct download ISO
- Fix for metalink support on SSVM agents (this closes CLOUDSTACK-10238)
- Reinstall VM from bypassed registered template (this closes CLOUDSTACK-10250)
- Fix upload certificate error message even though operation was successful
- Fix metalink download, checksum retry logic and metalink SSVM downloader
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.
This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).
This FR introduces two new global settings:
- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
for the agent's background task that checks and switches to agent's
preferred host.
The indirect.agent.lb.algorithm supports following algorithm options:
- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).
Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.
Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
'indirect.agent.lb.algorithm' global settings.
On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.
First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.
From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.
Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.
Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This would make keystore utility scripts being executed as sudoer
in case the process uid/owner is not root but still a sudoer user.
Also fails addHost while securing a KVM host and if keystore fails to be
setup for any reason.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Failure on HTTPS downloader for Direct Download templates on KVM.
Reason: Incorrect request caused NullPointerException getting the response InputStream
CLOUDSTACK-10269: On deletion of role set name to null (#2444)
CLOUDSTACK-10146 checksum in java instead of script (#2405)
CLOUDSTACK-10222: Clean snaphosts from primary storage when taking (#2398)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Renamed cloudstack-agent.logrotate to cloudstack-agent.logrotate.in,
so Ant will run the filterchain while copying.
This made the ant run copy block of cloudstack-agent.logrotate unnecessary,
so this is removed.
This fixes move refactoring error introduced in #2283
For instance, the class DatadiskTO is supposed to be in com.cloud.agent.api.to package. However, the folder structure it was placed in is com.cloud.agent.api.api.to.
Skip tests for cloud-plugin-hypervisor-ovm3:
For some unknown reason, there are quite a lot of broken test cases for cloud-plugin-hypervisor-ovm3. They might have appeared after some dependency upgrade and was overlooked by the person updating them. I checked them to see if they could be fixed, but these tests are not developed in a clear and clean manner. On top of that, we do not see (at least I) people using OVM3-hypervisor with ACS. Therefore, I decided to skip them.
Identention corrected to use spaces instead of tabs in XML files
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.
- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml
Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
* Cleanup and Improve NetUtils
This class had many unused methods, inconsistent names and redundant code.
This commit cleans up code, renames a few methods and constants.
The global/account setting 'api.allowed.source.cidr.list' is set
to 0.0.0.0/0,::/0 by default preserve the current behavior and thus
allow API calls for accounts from all IPv4 and IPv6 subnets.
Users can set it to a comma-separated list of IPv4/IPv6 subnets to
restrict API calls for Admin accounts to certain parts of their network(s).
This is to improve Security. Should an attacker steal the Access/Secret key
of an account he/she still needs to be in a subnet from where accounts are
allowed to perform API calls.
This is a good security measure for APIs which are connected to the public internet.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
- Refactors and simplifies systemvm codebase file structures keeping
the same resultant systemvm.iso packaging
- Password server systemd script and new postinit script that runs
before sshd starts
- Fixes to keepalived and conntrackd config to make rVRs work again
- New /etc/issue featuring ascii based cloudmonkey logo/message and
systemvmtemplate version
- SystemVM python codebase linted and tested. Added pylint/pep to
Travis.
- iptables re-application fixes for non-VR systemvms.
- SystemVM template build fixes.
- Default secondary storage vm service offering boosted to have 2vCPUs
and RAM equal to console proxy.
- Fixes to several marvin based smoke tests, especially rVR related
tests. rVR tests to consider 3*advert_int+skew timeout before status
is checked.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Refactor cloud-early-config and make appliance specific scripts
- Make patching work without requiring restart of appliance and remove
postinit script
- Migrate to systemd, speedup booting/loading
- Takes about 5-15s to boot on KVM, and 10-30seconds for VMware and XenServer
- Appliance boots and works on KVM, VMware, XenServer and HyperV
- Update Debian9 ISO url with sha512 checksum
- Speedup console proxy service launch
- Enable additional kernel modules
- Remove unknown ssh key
- Update vhd-util URL as previous URL was down
- Enable sshd by default
- Use hostnamectl to add hostname
- Disable services by default
- Use existing log4j xml, patching not necessary by cloud-early-config
- Several minor fixes and file refactorings, removed dead code/files
- Removes inserv
- Fix dnsmasq config syntax
- Fix haproxy config syntax
- Fix smoke tests and improve performance
- Fix apache pid file path in cloud.monitoring per the new template
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Commit enables a new feature for KVM hypervisor which purpose is to increase virtually amount of RAM available beyond the actual limit.
There is a new parameter in agent.properties: host.overcommit.mem.mb which enables adding specified amount of RAM to actually available. It is necessary to utilize KSM and ZSwap features which extend RAM with deduplication and compression.
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the Instance.
If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most commonly used.
vm.watchdog.model=i6300esb
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the Instance.
If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
<watchdog model='i6300esb' action='reset'/>
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most commonly used.
vm.watchdog.model=i6300esb
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Misc changes:
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This commit adds a additional VirtIO channel with the name
'org.qemu.guest_agent.0' to all Instances.
With the Qemu Guest Agent the Hypervisor gains more control over the Instance if
these tools are present inside the Instance, for example:
* Power control
* Flushing filesystems
* Fetching Network information
In the future this should allow safer snapshots on KVM since we can instruct the
Instance to flush the filesystems prior to snapshotting the disk.
More information: http://wiki.qemu.org/Features/QAPI/GuestAgent
Keep in mind that on Ubuntu AppArmor still needs to be disabled since the default
AppArmor profile doesn't allow libvirt to write into /var/lib/libvirt/qemu
This commit does not add any communication methods through API-calls, it merely
adds the channel to the Instances and installs the Guest Agent in the SSVMs.
With the addition of the Qemu Guest Agent channel a second channel appears in /dev
on a SSVM as a VirtIO port.
The order in which the ports are defined in the XML matters for the naming inside
the SSVM VM and by not relying on /dev/vportXX but looking for a static name the
SSVM still boots properly if the order in the XML definition is changed.
A SSVM with both ports attached will have something like this:
root@v-215-VM:~# ls -l /dev/virtio-ports
total 0
lrwxrwxrwx 1 root root 11 May 13 21:41 org.qemu.guest_agent.0 -> ../vport0p2
lrwxrwxrwx 1 root root 11 May 13 21:41 v-215-VM.vport -> ../vport0p1
root@v-215-VM:~# ls -l /dev/vport*
crw------- 1 root root 251, 1 May 13 21:41 /dev/vport0p1
crw------- 1 root root 251, 2 May 13 21:41 /dev/vport0p2
root@v-215-VM:~#
In this case the SSVM port points to /dev/vport0p1, but if the order in the XML
is different it might point to /dev/vport0p2
By looking for a portname with a pre-defined pattern in /dev/virtio-ports we
do not rely on the order in the XML definition.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
By adding a Random Number Generator device to Instances we can prevent
entropy starvation inside guest.
The default source is /dev/random on the host, but this can be configured
to another source when present, for example a hardware RNG.
When enabled it will add the following to the Instance's XML definition:
<rng model='virtio'>
<rate period='1000' bytes='2048' />
<backend model='random'>/dev/random</backend>
</rng>
If the Instance has the proper support, which most modern distributions have,
it will have a /dev/hwrng device which it can use for gathering entropy.
More information: https://libvirt.org/formatdomain.html#elementsRng