* scaleio: prototype storage plugin
- plugin skeleton
- add storage pool, create/attach data disk
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* kvm: attach disk example
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Updated ScaleIO storage plugin to support Volume operations
* ScaleIO storage plugin - Support for VM operations and other updates
* ScaleIO storage pool plugin changes
- Added validation to check existing ScaleIO storage pool and update capacity details
- Updated resize volume for ScaleIO to pick the rounded 8GB boundary size
- Added support for setting ScaleIO storage pool statistics (bandwidthLimitInKbps, iopsLimit)
* Fixed IOPS validation and volume size update when resizing ScaleIO volume
* Removed connect/disconnect disk changes from ScaleIO storage adaptor
- ScaleIO datastore driver does map/unmap ScaleIO volume (from MS) using grant/revoke access
- Not required to map/unmap ScaleIO volume from the storage adaptor
* Updated connect disk, to wait for ScaleIO volume to become available in the KVM host
* Updated ScaleIO storage provider, pool type, url scheme and related paramters to the new "PowerFlex" brand
* Fixed size rounding issue while creating PowerFlex volume and added validations to PowerFlex Gateway API client
* Updated host sdc connection check for ScaleIO/PowerFlex pool on host connect
* Updated volume snapshots support for volumes on ScaleIO/PowerFlex storage pool and Added some validations for ScaleIO disks in host
* Added primary storage level configurable setting "storage.pool.disk.wait" to wait for disk availability
- Confiure the disk availability wait time, mainly introduced for ScaleIO/PowerFlex storage pool (can be used for other managed storages), to wait for the disk to become available in the host before performing any operation on it
* Enabled template spooling to ScaleIO/PowerFlex storage pool and create VM from the spooled template.
Added ScaleIO SDC limits support for volumes using offering parameters: bandwidthLimitInKbps, iopsLimit.
* Added support for VM snapshots on ScaleIO/PowerFlex storage pool
Minor improvements for IOPS (SDC Limits) configuration
* Updated access for ScaleIO/PowerFlex volumes on VM Start and Stop
Added primary storage level configurable setting "storage.pool.client.timeout" for storage API client
Enabled cluster wide storage pool support for ScaleIO/PowerFlex storage
Minor improvements for ScaleIO/PowerFlex disk access in the KVM host
* Added support for direct download of templates (raw, qcow2) on ScaleIO/PowerFlex storage pool
* Added support for config drives in host cache for KVM
- Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
- Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
- Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
- Added new parameter "vm.configdrive.host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path for config drives
* Updated disk access while migrating the VM with volumes on ScaleIO/PowerFlex storage pool
Changed the parameter "vm.configdrive.host.cache.location" to "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties to specify the host cache path
Changes to create config drives on the "/config" directory on the host cache path
Changes to suppport migrate VM with config drive on the host cache path
* Additonal changes to support migrate VM with config drive on the host cache
* Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
Updated full deployment destination for preparing the network(s) on VM start
* Propagate the direct download certificates uploaded to the newly added KVM hosts
* Code improvements for ScaleIO/PowerFlex storage plugin
* Updated storage stats collection and tests for ScaleIO/PowerFlex storage plugin
* Fix for template size of direct download templates on capacity check for ScaleIO/PowerFlex storage pool
Updated data object grant and revoke access for connected SDCs to ScaleIO/PowerFlex storage pool
* Discover the template size for direct download templates using any available host from the zones specified on template registration
When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
* Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
* Ensure the volume to be expunged, is expunge ready on storage cleanup
* Do not set the storage migration flag for the volumes on zone wide PowerFlex/ScaleIO pool when listing the hosts available for cross-cluster migration
* Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
* Added alerts for PowerFlex/ScaleIO SDC disconnection on the host(s)
* Retry VM deployment/start when the host cannot access volume/template on the ScaleIO/PowerFlex storage
* Changes to find a potential host that can access the ScaleIO/PowerFlex storage pool
* Updated ScaleIO/PowerFlex storage pool stats for checking the available capacity and usage
* Updated ScaleIO/PowerFlex volumes naming convention to avoid the naming conflicts on sharing
* Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
- Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
* Updated ScaleIO/PowerFlex storage pool capacity stats
* Cleanup unused templates and host entries on PowerFlex/ScaleIO storage pool deletion
* Check the router filesystem is writable or not, before performing health checks
- Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
- The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
* Updated the router filesystem writable check using script, instead cmd execution
- Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
* Update volume stats (physical and virtual size) for the volumes on PowerFlex/ScaleIO storage pool
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
This adds support for JDK11 in CloudStack 4.14+:
- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In VM migration on KVM, libvirt qemu hook script will change the bridge name to bridges for guest networks. It works for user vm. However for virtual router, it has nics on control network and public network. If control/public use different physical networks than guest network, virtual router cannot be migrated.
Fixes: #2783
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.
Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.
In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
KVM is supported on arm64 Linux (https://www.linux-kvm.org/page/Processor_support#ARM:).
For a small (IoT) platform such as the new Raspberry Pi 4 that uses armv8 processor
(cortex-a72) it's possible to run Linux host with `/dev/kvm`
accleration. This adds support for IoT IaaS in CloudStack.
This PR is from a fun weekend project where:
- I set up a Raspberry Pi 4 - 4GB RAM model with 4 CPU cores @ 1.5Ghz, 128GB SD samsung evo plus card
- Installed Ubuntu 19.10 raspi3 base image: http://cdimage.ubuntu.com/releases/19.10/release/ubuntu-19.10-preinstalled-server-arm64+raspi3.img.xz
- Build a custom Linux 5.3 kernel with KVM enabled, deb here: http://dl.rohityadav.cloud/cloudstack-rpi/kernel-19.10/ and install the linux-image and linux-module
- Then install/setup CloudStack on it (fix some issues around jna, by manually installing newer libjna-java to /usr/share/cloudstack-agent/lib)
- Since the host processor is not x86_64, I had to build a new arm64 (or aarch64) systemvmtemplate: http://dl.rohityadav.cloud/cloudstack-rpi/systemvmtemplate/
I could finally get a 4.13 CloudStack + Adv zone/networking to run on it
and deployed a KVM based Ubuntu 19.10 environment and NFS storage.
Deployed a test vm with isolated network, VR works as expected. Console
proxy works as well, for this tested against arm64 openstack Debian 9/10
templates.
I raised the issue of enabling KVM in upstream Ubuntu arm64 build: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1783961
Ubuntu kernel team has come back and future arm64 releases may have
KVM enabled by default.
Limitation: on my aarch64 env, it did not support IDE, therefore all
default bus type for volumes are SCSI by default. With VIRTIO it fails
sometimes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.
Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.
In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Using this tool on a hypervisor admins can query KVM Instances running
on that hypervisor if they have the Qemu Guest Agent installed.
All System VMs have this and they can be queried.
For example:
$ cloudstack-guest-tool i-2-25-VM
This will print some information about network and filesystem status.
root@hv-138-a05-23:~# ./cloudstack-guest-tool s-11-VM --command info|jq
{
"network": [
{
"ip-addresses": [
{
"prefix": 8,
"ip-address": "127.0.0.1",
"ip-address-type": "ipv4"
}
],
"name": "lo",
"hardware-address": "00:00:00:00:00:00"
},
{
"ip-addresses": [
{
"prefix": 16,
"ip-address": "169.254.242.169",
"ip-address-type": "ipv4"
}
],
"name": "eth0",
"hardware-address": "0e:00:a9:fe:f2:a9"
},
...
...
"filesystem": [
{
"mountpoint": "/var",
"disk": [
{
"bus": 0,
"bus-type": "virtio",
"target": 0,
"unit": 0,
"pci-controller": {
"slot": 7,
"bus": 0,
"domain": 0,
"function": 0
}
}
],
"type": "ext4",
"name": "vda6"
},
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This fixes the qemu hooks `mkdir` race condition which can happen when
too many VMs may launch on a KVM host executing the hooks script that
tries to `mkdir` for the custom directory. On exception (multiple scripts
trying to mkdir), the VM stops.
The custom directory need not be created if it does not exist, instead
the custom hooks should only execute when there is a custom directory.
Feature documentation:
http://docs.cloudstack.apache.org/en/4.11.2.0/adminguide/hosts.html#kvm-libvirt-hook-script-include
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Improvements on upload direct download certificates
* Move upload direct download certificate logic to KVM plugin
* Extend unit test certificate expiration days
* Add marvin tests and command to revoke certificates
* Review comments
* Do not include revoke certificates API
* Keep connection alive when on maintenance
* Refactor cancel maintenance and unit tests
* Add marvin tests
* Refactor
* Changing the way we get ssh credentials
* Add check on SSH restart and improve marvin tests
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).
This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue that on reconnection, agent LB feature will fail
and only the first ms-host will be tried reconnection again and again.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
KVM hook script include - logic to execute custom scripts & logging requirements
KVM hook script include - add logic to create custom directory if not exists & extra logging
* Cleaup and code-formatting POM files
* Remove obsolete mycila license-maven-plugin
* Remove obsolete console-proxy/plugin project
* Move console-proxy-rdbconsole under console-proxy parent
* Use correct parent path for rdpconsole
* Order alphabetally items in setnextversion.sh
* Unifiy License header in POMs
* Alphabetic order of modules definition
* Extract all defined versions into parent pom
* Remove obsolete files: version-info.in, configure-info.in
* Remove redundant defaultGoal
* Remove useless checkstyle plugin from checkstyle project
* Order alphabetally items in pom.xml
* Add aditional SPACEs to fix debian build
* Don't execute checkstyle on parent projects
* Use UTF-8 encoding in building checkstyle project
* Extract plugin versions into properties
* Execute PMD plugin on all the projects with -Penablefindbugs
* Upgrade maven plugins to latest version
* Make sure to always look for apache parent pom from repository
* Fix incorrect version grep in debian packaging
* Fix rebase conflicts
* Fix rebase conflicts
* Remove PMD for now to be fixed on another PR
Fixes the version in pom etc. to be consistent with versioning pattern as X.Y.Z.0-SNAPSHOT after a minor release.
Signed-off-by: Khosrow Moossavi <khos2ow@gmail.com>
In some environments running the keystore cert renewal (as root user)
over an already connected agent connection may cause exception
such as: `sudo: sorry, you must have a tty to run sudo`. Since, all
agents - KVM, CPVM and SSVM run as root user, we don't need to run
the renewal scripts with sudo.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent loses connection with management server, the reconnection
logic waits for any pending tasks to finish. However, when such tasks
do finish they fail to send an `Answer` back to managements server.
Therefore from a management server's perspective such pending
operations are stuck in a FSM state and need manual removal or fixing.
This is by design where management server's side cmd-answer request
pattern is code/execution dependent, therefore even if the answer
were to be sent when management server came back up (reconnects)
the management server will fail to acknowledge and process the answer
due to missing listeners or being in the exact state to handle answers.
Historically, the Agent would wait to reconnect until the internal
tasks complete but I found no reason why it should wait for reconnection
at all.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.
Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.
Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.
The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:
listen_tcp=0
listen_tls=1
tcp_port="16509"
tls_port="16514"
auth_tcp="none"
auth_tls="none"
key_file = "/etc/pki/libvirt/private/serverkey.pem"
cert_file = "/etc/pki/libvirt/servercert.pem"
ca_file = "/etc/pki/CA/cacert.pem"
For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.
There are no API or DB changes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>