* Live storage migration of volume in scaleIO within same storage scaleio cluster
* Added migrate command
* Recent changes of migration across clusters
* Fixed uuid
* recent changes
* Pivot changes
* working blockcopy api in libvirt
* Checking block copy status
* Formatting code
* Fixed failures
* code refactoring and some changes
* Removed unused methods
* removed unused imports
* Unit tests to check if volume belongs to same or different storage scaleio cluster
* Unit tests for volume livemigration in ScaleIOPrimaryDataStoreDriver
* Fixed offline volume migration case and allowed encrypted volume migration
* Added more integration tests
* Support for migration of encrypted volumes across different scaleio clusters
* Fix UI notifications for migrate volume
* Data volume offline migration: save encryption details to destination volume entry
* Offline storage migration for scaleio encrypted volumes
* Allow multiple Volumes to be migrated with migrateVirtualMachineWithVolume API
* Removed unused unittests
* Removed duplicate keys in migrate volume vue file
* Fix Unit tests
* Add volume secrets if does not exists during volume migrations. secrets are getting cleared on package upgrades.
* Fix secret UUID for encrypted volume migration
* Added a null check for secret before removing
* Added more unit tests
* Fixed passphrase check
* Add image options to the encypted volume conversion
This PR adds two vm setting for user vms on KVM
- nic multiqueue number
- packed virtqueues enabled . optional are true and false (false by default). It requires qemu>=4.2.0 and libvirt >=6.3.0
Tested ok on ubuntu 22 and rocky 8.4
* Auto Enable Disable KVM hosts
* Improve health check result
* Fix corner cases
* Script path refactor
* Fix sonar cloud reports
* Fix last code smells
* Add marvin tests
* Fix new line on agent.properties to prevent host add failures
* Send alert on auto-enable-disable and add annotations when the setting is enabled
* Address reviews
* Add a reason for enabling or disabling a host when the automatic feature is enabled
* Fix comment on the marvin test description
* Fix for disabling the feature if the admin has manually updated the host resource state before any health check result
* Cleanup in the javadocs of QemuImg
* Update QemuImg.java
* Apply suggestions from code review
Co-authored-by: Stephan Krug <stekrug@icloud.com>
Co-authored-by: cloudstack-lab-gabriel <gabriel.fernandes@scclouds.com.br>
Co-authored-by: Stephan Krug <stekrug@icloud.com>
This PR has 3 improvements for the Linstor primary storage driver:
- Create a separate jar of it and move all Linstor related classes into the correct project (similar to the storpool plugin)
- Add aux properties for Cloudstack volumes in Linstor to make it easier to identify them in Linstor
- Add support for IOPs settings with the Linstor storage plugin
This PR fixes the issue that volume snapshot fails on RBD storage with the following error
qemu-img: Could not open 'driver=raw,file.filename=rbd:cloudstack/test_wei.img:mon_host=10.0.32.254:auth_supported=cephx:id=cloudstack:key=AQDwHTNjjHXRKRAAJb+AToFr6x4a1AvKUc4Ksg==:rbd_default_format=2:client_mount_timeout=30': Could not open 'rbd:cloudstack/test_wei.img:mon_host=10.0.32.254:auth_supported=cephx:id=cloudstack:key=AQDwHTNjjHXRKRAAJb+AToFr6x4a1AvKUc4Ksg==:rbd_default_format=2:client_mount_timeout=30': No such file or directory
However, it works without using image options
Therefore, do not pass the image options if the image format is not QCOW2 and LUKS.
This PR introduces a feature designed to allow CloudStack to manage a generic volume encryption setting. The encryption is handled transparently to the guest OS, and is intended to handle VM guest data encryption at rest and possibly over the wire, though the actual encryption implementation is up to the primary storage driver.
In some cases cloud customers may still prefer to maintain their own guest-level volume encryption, if they don't trust the cloud provider. However, for private cloud cases this greatly simplifies the guest OS experience in terms of running volume encryption for guests without the user having to manage keys, deal with key servers and guest booting being dependent on network connectivity to them (i.e. Tang), etc, especially in cases where users are attaching/detaching data disks and moving them between VMs occasionally.
The feature can be thought of as having two parts - the API/control plane (which includes scheduling aspects), and the storage driver implementation.
This initial PR adds the encryption setting to disk offerings and service offerings (for root volume), and implements encryption support for KVM SharedMountPoint, NFS, Local, and ScaleIO storage pools.
NOTE: While not required, operations can be significantly sped up by ensuring that hosts have the `rng-tools` package and service installed and running on the management server and hypervisors. For EL hosts the service is `rngd` and for Debian it is `rng-tools`. In particular, the use of SecureRandom for generating volume passphrases can be slow if there isn't a good source of entropy. This could affect testing and build environments, and otherwise would only affect users who actually use the encryption feature. If you find tests or volume creates blocking on encryption, check this first.
### Management Server
##### API
* createDiskOffering now has an 'encrypt' Boolean
* createServiceOffering now has an 'encryptroot' Boolean. The 'root' suffix is added here in case there is ever any other need to encrypt something related to the guest configuration, like the RAM of a VM. This has been refactored to deal with the new separation of service offering from disk offering internally.
* listDiskOfferings shows encryption support on each offering, and has an encrypt boolean to choose to list only offerings that do or do not support encryption
* listServiceOfferings shows encryption support on each offering, and has an encrypt boolean to choose to list only offerings that do or do not support encryption
* listHosts now shows encryption support of each hypervisor host via `encryptionsupported`
* Volumes themselves don't show encryption on/off, rather the offering should be referenced. This follows the same pattern as other disk offering based settings such as the IOPS of the volume.
##### Volume functions
A decent effort has been made to ensure that the most common volume functions have either been cleanly supported or blocked. However, for the first release it is advised to mark this feature as *experimental*, as the code base is complex and there are certainly edge cases to be found.
Many of these features could eventually be supported over time, such as creating templates from encrypted volumes, but the effort and size of the change is already overwhelming.
Supported functions:
* Data Volume create
* VM root volume create
* VM root volume reinstall
* Offline volume snapshot/restore
* Migration of VM with storage (e.g. local storage VM migration)
* Resize volume
* Detach/attach volume
Blocked functions:
* Online volume snapshot
* VM snapshot w/memory
* Scheduled snapshots (would fail when VM is running)
* Disk offering migration to offerings that don't have matching encryption
* Creating template from encrypted volume
* Creating volume from encrypted volume
* Volume extraction (would we decrypt it first, or expose the key? Probably the former).
##### Primary Storage Support
For storage developers, adding encryption support involves:
1. Updating the `StoragePoolType` for your primary storage to advertise encryption support. This is used during allocation of storage to match storage types that support encryption to storage that supports it.
2. Implementing encryption feature when your `PrimaryDataStoreDriver` is called to perform volume lifecycle functions on volumes that are requesting encryption. You are free to do what your storage supports - this could be as simple as calling a storage API with the right flag when creating a volume. Or (as is the case with the KVM storage types), as complex as managing volume details directly at the hypervisor host. The data objects passed to the storage driver will contain volume passphrases, if encryption is requested.
##### Scheduling
For the KVM implementations specified above, we are dependent on the KVM hosts having support for volume encryption tools. As such, the hosts `StartupRoutingCommand` has been modified to advertise whether the host supports encryption. This is done via a probe during agent startup to look for functioning `cryptsetup` and support in `qemu-img`. This is also visible via the listHosts API and the host details in the UI. This was patterned after other features that require hypervisor support such as UEFI.
The `EndPointSelector` interface and `DefaultEndpointSelector` have had new methods added, which allow the caller to ask for endpoints that support encryption. This can be used by storage drivers to find the proper hosts to send storage commands that involve encryption. Not all volume activities will require a host to support encryption (for example a snapshot backup is a simple file copy), and this is the reason why the interface has been modified to allow for the storage driver to decide, rather than just passing the data objects to the EndpointSelector and letting the implementation decide.
VM scheduling has also been modified. When a VM start is requested, if any volume that requires encryption is attached, it will filter out hosts that don't support encryption.
##### DB Changes
A volume whose disk offering enables encryption will get a passphrase generated for it before its first use. This is stored in the new 'passphrase' table, and is encrypted using the CloudStack installation's standard configured DB encryption. A field has been added to the volumes table, referencing this passphrase, and a foreign key added to ensure passphrases that are referenced can't be removed from the database. The volumes table now also contains an encryption format field, which is set by the implementer of the encryption and used as it sees fit.
#### KVM Agent
For the KVM storage pool types supported, the encryption has been implemented at Qemu itself, using the built-in LUKS storage support. This means that the storage remains encrypted all the way to the VM process, and decrypted before the block device is visible to the guest. This may not be necessary in order to implement encryption for /your/ storage pool type, maybe you have a kernel driver that decrypts before the block device on the system, or something like that. However, it seemed like the simplest, common place to terminate the encryption, and provides the lowest surface area for decrypted guest data.
For qcow2 based storage, `qemu-img` is used to set up a qcow2 file with LUKS encryption. For block based (currently just ScaleIO storage), the `cryptsetup` utility is used to format the block device as LUKS for data disks, but `qemu-img` and its LUKS support is used for template copy.
Any volume that requires encryption will contain a passphrase ID as a byte array when handed down to the KVM agent. Care has been taken to ensure this doesn't get logged, and it is cleared after use in attempt to avoid exposing it before garbage collection occurs. On the agent side, this passphrase is used in two ways:
1. In cases where the volume experiences some libvirt interaction it is loaded into libvirt as an ephemeral, private secret and then referenced by secret UUID in any libvirt XML. This applies to things like VM startup, migration preparation, etc.
2. In cases where `qemu-img` needs to use this passphrase for volume operations, it is written to a `KeyFile` on the cloudstack agent's configured tmpfs and passed along. The `KeyFile` is a `Closeable` and when it is closed, it is deleted. This allows us to try-with-resources any volume operations and get the KeyFile removed regardless.
In order to support the advanced syntax required to handle encryption and passphrases with `qemu-img`, the `QemuImg` utility has been modified to support the new `--object` and `--image-opts` flags. These are modeled as `QemuObject` and `QemuImageOptions`. These `qemu-img` flags have been designed to supersede some of the existing, older flags being used today (such as choosing file formats and paths), and an effort could be made to switch over to these wholesale. However, for now we have instead opted to keep existing functions and do some wrapping to ensure backward compatibility, so callers of `QemuImg` can choose to use either way.
It should be noted that there are also a few different Enums that represent the encryption format for various purposes. While these are analogous in principle, they represent different things and should not be confused. For example, the supported encryption format strings for the `cryptsetup` utility has `LuksType.LUKS` while `QemuImg` has a `QemuImg.PhysicalDiskFormat.LUKS`.
Some additional effort could potentially be made to support advanced encryption configurations, such as choosing between LUKS1 and LUKS2 or changing cipher details. These may require changes all the way up through the control plane. However, in practice Libvirt and Qemu currently only support LUKS1 today. Additionally, the cipher details aren't required in order to use an encrypted volume, as they're stored in the LUKS header on the volume there is no need to store these elsewhere. As such, we need only set the one encryption format upon volume creation, which is persisted in the volumes table and then available later as needed. In the future when LUKS2 is standard and fully supported, we could move to it as the default and old volumes will still reference LUKS1 and have the headers on-disk to ensure they remain usable. We could also possibly support an automatic upgrade of the headers down the road, or a volume migration mechanism.
Every version of cryptsetup and qemu-img tested on variants of EL7 and Ubuntu that support encryption use the XTS-AES 256 cipher, which is the leading industry standard and widely used cipher today (e.g. BitLocker and FileVault).
Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
Fixes#6680
While finding CPU speed for KVM host following methods will be used in the same order:
1. lscpu
2. value in /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
3. virsh capabilities
4. libvirt nodeinfo
This will allow correct value for AMD based hosts when first two methods doesn't give a value
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
This PR provides constructors and the associated changes to use LibvirtVMDef for creating user mode network interfaces.
While this isn't used directly in the CloudStack KVM agent today, it could be used in the future for e.g. pod networking/management networks without needing to assign a pod IP. The VIF driver used by the CloudStack Agent is also pluggable, so this allows plugin code to create user mode network interfaces as well.
Note that the user mode network already exists in the GuestNetType enum, but wasn't usable prior to this change.
Also included unit test to ensure we continue to create the expected XML.
Additionally, this uncovered a null pointer on _networkRateKBps and this PR fixes it. The decision to add bandwidth throttling assumes this field is not null and simply checks for > 0.
Signed-off-by: Marcus Sorensen <mls@apple.com>
Co-authored-by: Marcus Sorensen <mls@apple.com>
Fixes#6621
Each time getMemStat() is called, a static value is returned. This value
should instead be refreshed to return the actual memory used.
Co-authored-by: Ruben Bosch <ruben.bosch@cldin.eu>
Fixes#6455
The default storage adaptor - LibvirtStorageAdaptor - is used by different storage types and doesn't use the annotation @StorageAdaptorInfo. In this case, a storage plugin that wants to adopt one of the predefined storage pool types will override the default behaviour. If fixing the issue in general (for new storage plugins or current ones that want to reuse the existing storage pool types) would affect all volume/snapshot/VM cases. This will lead to the need of extensive testing for each storage plugin for which we don't have the resources to do it. That's why this patch fixes the old behaviour for the SharedMountPoint by adding a new storage pool type for the StorPool plugin.
This PR fixes the issue #6209 where the snapshot revert operation fails after certain volume operations like Migrate VM with volume / migrate volume / reinstall VM.
The root cause of the issue after these volume operations, the primary storage entry is getting deleted for that volume. We have fixed it here to get the primary datastore entry wrt volume and continue the operation.
* Add more logs to migrate VM process in KVM
* Remove unused imports
* Verify if debug is enable before write the log string
* Fix conflicts
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
Co-authored-by: SadiJr <sadi@scclouds.com.br>
This PR enhances the existing PowerFlex/ScaleIO storage plugin to support separate (storage) network for Hosts(KVM)/Storage connection, mainly the SDC (ScaleIo Data Client) connection.
* Fix extract snapshot from VM snapshot on KVM
* Fix validation expression - does not need to escape the slash
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
* kvm: truncate vnc password to 8 chars (#6244)
This PR truncates the vnc password of kvm vms to 8 chars to support latest versions of libvirt.
* Use lang3 string utils
Co-authored-by: Wei Zhou <weizhou@apache.org>
* agent: enable ssl only for kvm agent (not in system vms)
* Revert "agent: enable ssl only for kvm agent (not in system vms)"
This reverts commit b2d76bad2e.
* Revert "KVM: Enable SSL if keystore exists (#6200)"
This reverts commit 4525f8c8e7.
* KVM: Enable SSL if keystore exists in LibvirtComputingResource.java
* Improve log when live patching fails
* change patching path from /tmp to /var/cache/clou
* add iptable rule for console proxy (novnc)
* temporary template paths
* revert pom xml to original paths
* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency
* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp
* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup
* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes
* Add ssh to k8s nodes details in the Access tab on the UI
* test
* Refactor ca/cert patching logic
* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script
* remove all references of systemvm.iso
* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs
* fix script timeout
* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand
* remove commented code + change core user to cloud for cks nodes
* Update ownership of ssh directory
* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)
* Add UI changes + move changes from patch file to runcmd
* test: validate performance for template modification during seeding
* create vms folder in cloudstack-commons directory - debian rules
* remove logic for on the fly template convert + update k8s test
* fix syntax issue - causing issue with shared network tests
* Code cleanup
* refactor patching logic - certs
* move logic of fixing rootdiskcontroller from upgrade to kubernetes service
* add livepatch option to restart network & vpc
* smooth upgrade of cks clusters
* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency
* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp
* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup
* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes
* Add ssh to k8s nodes details in the Access tab on the UI
* test
* Refactor ca/cert patching logic
* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script
* remove all references of systemvm.iso
* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs
* fix script timeout
* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand
* remove commented code + change core user to cloud for cks nodes
* Update ownership of ssh directory
* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)
* Add UI changes + move changes from patch file to runcmd
* test: validate performance for template modification during seeding
* create vms folder in cloudstack-commons directory - debian rules
* remove logic for on the fly template convert + update k8s test
* fix syntax issue - causing issue with shared network tests
* Code cleanup
* add cgroup config for containerd
* add systemd config for kubelet
* add additional info during image registry config
* address comments
* add temp links of download.cloudstack.org
* address part of the comments
* address comments
* update containerd config - as version has upgraded to 1.5 from 1.4.12 in 4.17.0
* address comments - simplify
* fix vue3 related icon changes
* allow network commands when router template version is lower but is patched
* add internal LB to the list of routers to be patched on network restart with live patch
* add unit tests for API param validations and new helper utilities - file scp & checksum validations
* perform patching only for non-user i.e., system VMs
* add test to validate params
* remove unused import
* add column to domain_router to display software version and support networkrestart with livePatch from router view
* Requires upgrade column to consider package (cloud-scripts) checksum to identify if true/false
* use router software version instead of checksum
* show N/A if no software version reported i.e., in upgraded envs
* fix deb failure
* update pom to official links of systemVM template
When using advanced virtualization the IO Driver is not supported. The
admin will decide if want to enable/disable this configuration from
agent.properties file. The default value is true
* Refactor create volume snapshot with running VM
* Refactor create volume snapshot with stopped VM
* Refactor create volume from snapshot
* Refactor create template from snapshot
* Refactor volume migration (migrateVolume/ migrateVirtualMachineWithVolume)
* Refactor snapshot deletion
* Refactor snapshot revertion
* Adjusts and fix cherry-pick conflicts
* Remove diffuse tests
* Add validation to add flag '--delete' on command 'virsh blockcommand' only if libvirt version is equal or higher 6.0.0
* Expunge temporary snapshot only if template creation is from snapshot
* Extract strings to constant
* Remove unused imports
* Fix error on revert backed up snapshot
* Turn method's return to void as it is not used
* Rename method in SnapshotHelper
* Fix folder creation when using SharedMountPoint pool
* Remove static import
* Remove unnused method
* Cover take snapshot in centos 7
* Handle right snapshot flag according to qemu version
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
* VM snapshots of running KVM instance using storage providers plugins for disk snapshots
Added new virtual machine snapshot strategy which is using storage providers plugins to take/revert/delete snapshots.
You can take VM snapshot without VM memory on KVM instance, using storage providers implementations for disk snapshots.
Also revert and delete is added as functionality. Added Thaw/Freeze command for KVM instance.
The snapshots will be consistent, because we freeze the VM during the snapshotting. Backup to secondary storage is executed after
thaw of the VM and if it is enabled in global settings.
* Removed duplicated functionality
Set few methods in DefaultVMSnapshotStrategy to protected to reuse them
without duplicating the code. Remove code that is actualy not needed
* Added requirements in global setting kvm.vmstoragesnapshot.enabled
Added more information in kvm.vmstoragesnapshot.enabled global setting,
that it needs installation of:
- qemu version 1.6+
- qemu-guest-agent installed on guest virtual machine
when the option is enabled
* Added Apache license header
* Removed commented code
* If "kvm.vmstoragesnapshot.enabled" is null should be considered as false
* removed unused imports, replaced default template
Removed unused imports which causing failures and replaced template to
CentOS8
* "kvm.vmstoragesnapshot.enabled" set to dynamic
* Getting status of freeze/thaw commands not the return code
Will chacke the status if freeze/thaw of Guest VM succeded, rather than
looking for return code. Code refactoring
* removed "CreatingKVM" VMsnapshot state and events related to it
* renamed AllocatedKVM to AllocatedVM
the states should not be associated to a hypervisor type
* loggin the result of "drive-backup" command
* Check which VM snapshot strategy could handle the vm snapshots
gets the best match of VM snapshot strategy which could handle the vm
snapshots on KVM.
Other storage plugins could integrate with this functionality to support group snapshots
* Added poolId in canHandle for KVM hypervisors
Added poolId into canHandle method used to check if all volumes are on
the same PowerFlex's storage pool
* skip smoke tests if the hypervisor's OS type is CentOS
This PR works with functionality included in qemu-kvm-ev which
does not come by default on CentOS. The smoke tests will be skipped if
the hypervisor OS is CentOS
* Added missed import in smoke test
* Suggested change to use ` org.apache.commons.lang.StringUtils.isNotBlank`
* Fix getting device on Ubuntu
On Ubuntu the device isn't provided and we have to get it from
node-name parameter. For drive-backup command (for Ubuntu) is needed and job-id which
is the value of node-name (this extra param works on Ubuntu and CentOS as well).
* Removed new snapshot states and functionality for NFS
* throw CloudRuntimeException
provide a properer error message when delete VM snapshot fails
* exclude GROUP snapshots when listing snapshots
* Skip tests if there is pool with NFS/Local
* address comments
This enables jacoco, which didn't run before with the -P quality due to
missing passing of jacoco arg line to surefire plugin.
This also adds support for jacoco/quality builds using Github action and
posting of the PR coverage data using a new action step.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Update VM priority (cpu_chares) when live scaling it
* Apply suggestions from code review
Co-authored-by: dahn <daan.hoogland@gmail.com>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Addressing javadoc review
* Addressing review typo
Co-authored-by: dahn <daan.hoogland@gmail.com>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* kvm: Use lscpu to get cpu max speed
* Fix str conversion
* Reorder
* Refactor
* Apply suggestions from code review
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Updated the calling method name getCpuSpeedFromCommandLscpu
* Make it more readable
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
Currently, our compute offerings and disk offerings are tightly coupled with respect to many aspects. For example, if a compute offering is created, a corresponding disk offering entry is also created with the same ID as the reference. Also creating compute offering takes few disk-related parameters which anyway goes to the corresponding disk offering only. I think this design was initially made to address compute offering for the root volume created from a template. Also changing the offering of a volume is tightly coupled with storage tags and has to be done in different APIs either migrateVolume or resizeVolume. Changing of disk offering should be seamless and should consider new storage tags, new size and place the volume in appropriate state as defined in disk offering.
more details are mentioned here https://cwiki.apache.org/confluence/display/CLOUDSTACK/Compute+offering+and+disk+offering+refactoring
* Schema changes and disk offering column change from "type" to "compute_only"
* Few more changes
* Decoupled service offering and disk offering
* Remove diskofferingid from vminstance VO
* Decouple service offering and disk offering states
* diskoffering getsize() is only for strict disk offerings
* Fix deployVM flow
* Added new API params to compute offering creation
* Add diskofferingstrictness to serviceoffering vo under quota
* Added overrideDiskOfferingId parameter in deploy VM API which will override disk offering for the root disk both in template and ISO case
Added diskSizeStrictness parameter in create Disk offering API which will decide whether to restrict resize or disk offering change of a volume
* Fix User vm response to show proper service offering and disk offerings
* Added disk size strictness in disk offering response
* Added disk offering strictness to the service offering response
* Remove comments
* Added UI changes for Disk offering strictness in add compute offering form and Disk size strictness in add disk offering form
* Added diskoffering details to the service offering response
* Added UI changes in deployvm wizard to accept override disk offering id
* Fix delete compute offering
* Fix VM deployment from custom service offering
* Move uselocalstorage column access from service offering to disk offering
* UI: Separated compute and disk releated parameters in add compute offering wizard, also added association to disk offering
* Fixed diskoffering automatic selection on add compute offering wizard
* UI: move compute only toggle button outside the box in add compute offering wizard
* Added volumeId parameter to listDiskOfferings API and the disksizestrictness flag of the current disk offering is honored while list disk offerings
* Added configuration parameter to decide whether to check volume tags on the destination storagepool during migration
* Added disk offering change checks during resize volume operation
* Added new API changeofferingforVolume API and corresponding changes
* Add UI form for changeOfferingForVolume API
* Fix UI conflicts
* Fix service offering usage as disk offering
* Fix unit test failures
* fix user_vm_view
* Addressed review comments
* Fixed service_offering_view
* Fix service offering edit flow
* Fix service offering constructor to address custom offering
* Fix domain_router_view to get proper service offering id
* Removed unused import
* Addressed review comments and fixed update service offering flow with storage tags
* Added marvin test cases for checking disk offering strictness
* review comments addressed
* Remove system_use column from disk offering join
* update volume_view to update system_use column from service offering and not disk offering
* Fix changeOfferingForVolume API for custom disk offering
* Fix global setting implementation
* Fix list volumes, after changing system_use column from disk offering to service offering in volume_view
* Changes for override root disk offering in deployvm wizard in case of custom offering
* Fix a unit test case
* Fixed recent unit test cases with new serviceofferingvo constructor
* Fix unit test in VolumeApiServiceImpl
* Added storage id for the list disk offering API and corresponding UI changes in migrateVolume and changeOfferingForVolume flow
* Rename global configuration parameter from storage.pool.tags.disk.offering.strictness to match.storage.pool.tags.with.disk.offering
* Fix smoke test failures
* Added tool tip for migrate volume UI form
* Address review comments and fix UI form of deploy VM in case of ISO.
* Fixed resize volume UI form for data disk
* UI changes to disable override root disk size when override root disk offering is enabled
* UI fix in deploy vm wizard
* Fix listdiskoffering after rebasing with main
* Fixed UI in migrate and changeofferingfor volume to handle empty disk offering list
Removed the volume's current disk offering from listDiskOffering response list
* Added custom Iops to resize volume form and removed the current disk offering during change offering for volume UI form
* Fix false response on updateDiskOffering API
* Added search field for changeofferingforvolume UI form
* Fix resize volume and migrate volume to update volume path if DRS is applied on volume in datastore cluster
* Removed DB changes from 4.16 upgrade file
* Resolving merge conflicts with main 4.17
* Added support for auto migration and auto resize of the root volume upon changing the service offering for VM.
* UI: Added automigrate checkbox in scale VM form
* Addes since attributes to new API params
* Added shrinkOK parameter to changeofferingforvolume API
* Added shrinkOk param to UI in changeOfferingforVolume form
* Added shrinkOk flag to scaleVM and changeServiceForVirtualMachines and UI form
* Removed old foreign key constraint on IDs of service offering and disk offering
* Allow resize and automigrate of root volume if required in all cases of service offering change
* Allow only resize to higher disk size from UI
* Fixing vue syntax error
* Make UI changes to provide root disk size box when the linked disk offering is of custom
* Converted from check box to toggle in scale VM, changeoffering, resize and migrate volume forms
* Fix resize volume operation to update the VM settings
* Fix migratevolume form to pick selected storage pool id in list diskofferings API
* kvm: don't force scsi controller for aarch64 VMs
This would allow use of virtio disk controller with Ceph, etc or as
defined in the VM's root disk controller setting, rather than always
enforce SCSI.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* remove test that doesn't apply now
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* address review comment
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
With diskful set to true, linstor will fail if it cannot create local
storage for the resource. Which in turn will make it impossible to have a
setup with just compute nodes on cloudstack.
* KVM: Add MV Settings for virtual GPU hardware type and memory
* fix method createVideoDef argument in test package
* add available options for KVM virtual GPU hardware VM setting
* fix videoRam default value
* fix _videoRam is 0, it will use default provided by libvirt
This adds a volume(primary) storage plugin for the Linstor SDS.
Currently it can create/delete/migrate volumes, snapshots should be possible,
but currently don't work for RAW volume types in cloudstack.
* plugin-storage-volume-linstor: notify libvirt guests about the resize
* Fix of creating volumes from snapshots without backup
When few snaphots are created onyl on primary storage, and try to create
a volume or a template from the snapshot only the first operation is
successful. Its because the snapshot is backup on secondary storage with
wrong SQL query. The problem appears on Ceph/NFS but may affects other
storage plugins.
Bypassing secondary storage is implemented only for Ceph primary storage
and it didn't cover the functionality to create volume from snapshot
which is kept only on Ceph
* Address review
* Create utility to centralize byte convertions
* Add/change toString definitions
* Create Libvirt handler to ScaleVmCommand
* Enable dynamic scalling VM with KVM
* Move config from interface to class and rename it
As every variable declared in interfaces are already final,
this moving will be needed to mock tests in nexts commits
* Configure VM max memory and cpu cores
The values are according to service offering or global configs
* Extract dpdk configuration to a method and test it
* Extract OS desc config to a method and test it
* Extract guest resource def to a method and test it
Improve libvirt def
* Refactor LibvirtVMDef.GuestResourceDef
* Refactor ScaleVmCommand
* Improve VMInstaVO toString()
* Refactor upgradeRunningVirtualMachine method
* Turn int variables into long on utility
* Verify if VM is scalable on KVMGuru
* Rename some KVMGuruTest's methods
* Change vm's xml to work with max memory
* Verify if service offering is dynamic before scale
* Create methods to retrieve data from domain
* Create def to hotplug memory
* Adjust the way command was scaling the VM
* Fix database persistence before executing command
* Send more info to host to improve log
* Fix var name
* Fix missing "}"
* Undo unnecessary changes
* Address review
* Fix scale validation
* Add VM prepared for dynamic scaling validation
* Refactor LibvirtScaleVmCommandWrapper and improve unit tests
* Remove duplicated method
* Add RuntimeException check
* Remove copyright from header
* Remove copyright from header
* Remove copyright from header
* Remove copyright from header
* Remove copyright from header
* Update ByteScaleUtilsTest.java
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* Add SharedMountPoint to KVMs supported storage pool types
* Fix live migration to iSCSI and improve logs
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* Externalize KVM Agent storage's timeout configuration
* Address @nvazquez review
* Add empty line at the end of the agent.properties file
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
* Refactor method createVMFromSpec
* Add unit tests
* Fix test
* Extract if block to method for add extra configs to VM Domain XML
* Split travis tests trying to isolate which test is causing an error
* Override toString() method
* Update documentation
* Fix checkstyle error (line with trailing spaces)
* Change VirtualMachineTO print of object
* Add try except to find message error. Remove after test
* Fix indent
* Trying to understanding why is happening in this code
* Refactor method createVMFromSpec
* Add unit tests
* Fix test
* Extract if block to method for add extra configs to VM Domain XML
* Split travis tests trying to isolate which test is causing an error
* Override toString() method
* Update documentation
* Fix checkstyle error (line with trailing spaces)
* Remove unnecessary comment
* Revert travis tests
Co-authored-by: SadiJr <17a0db2854@firemailbox.club>
* Externalize KVM Agent storage's timeout configuration
Created a class of constant agent's properties available to configure on "agent.properties".
Created a class to provides a facility to read the agent's properties file and get its properties.
* Refactored KVHAMonitor nested thread and changed some logs
* It has been added the timeout's config in the agent.properties file
* Rename classes
* Rename var and remove comment
* Fix typo with word "heartbeat"
* Extract multiple methods call to variables
* Add unit tests to file handler
* Increase info about the property
* Create inner class Property
* Rename method getProperty to getPropertyValue
* Remove copyright
* Remove copyright
* Extract code to createHeartBeatCommand
* Change method access from protected to private
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
On newer libvirt/qemu it seems PCI hot-plugging could be an issue as
seen in:
https://www.suse.com/support/kb/doc/?id=000019383https://bugs.launchpad.net/nova/+bug/1836065
This was found to be true on ARM64/aarch64 platform (tested on
RaspberryPi4). As per the default machine doc, it advises to
pre-allocate PCI controllers on the machine and pcie-to-pci-bridge based
controller for legacy PCI models:
https://libvirt.org/pci-hotplug.html#x86_64-q35
This patch introduces the concept as a workaround until a proper fix is
done (ideally in the upstream libvirt/qemu projects). Until then client
code can add 32 PCI controllers and a pcie-to-pci-bridge controller for
aarch64 platforms.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Currently there is no disk IO driver configuration for VMs running on KVM. That's OK for most the cases; however, recently there have been added some quite interesting optimizations with the IO driver io_uring.
Note that IO URING requires:
Qemu >= 5.0, and
Libvirt >= 6.3.0.
By using io_uring we can see a massive I/O performance improvement within Virtual Machines running from Local and/or NFS storage.
This implementation enhances the KVM disk configuration by adding workflow for setting the disk IO drivers. Additionally, if the Qemu and Libvirt versions matches with the required for having io_uring we are going to set it on the VM. If there is no support for such driver we keep it as it is nowadays, without any IO driver configured.
Fixes: #4883
* server: fix failed to apply userdata when enable static nat
* server: fix cannot expunge vm as applyUserdata fails
* configdrive: fix ISO is not recognized when plug a new nic
* configdrive: detach and attach configdrive ISO as it is changed when plug a new nic or migrate vm
* configdrive test: (1) password file does not exists in recreated ISO; (2) vm hostname should be changed after migration
* configdrive: use centos55 template with sshkey and configdrive support
* configdrive: disklabel is 'config-2' for configdrive ISO
* configdrive: use copy for configdrive ISO and move for other template/volume/iso
* configdrive: use public-keys.txt
* configdrive test: fix (1) update_template ; (2) ssh into vm by keypair
This PR intends to improve logging on agent start to facilitate troubleshooting.
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
* vxlan: arp does not work between hosts as multicast group is communicated over physical nic instead of linux bridge
when linux bridge is setup (refer to http://docs.cloudstack.apache.org/projects/archived-cloudstack-getting-started/en/latest/networking/vxlan.html#configure-product-to-use-vxlan-plugin) and used as the kvm traffic label of physical networks, the vms on different hosts cannot reach each other.
(1) does not work:
```
/usr/share/cloudstack-common/scripts/vm/network/vnet/modifyvxlan.sh -v 1001 -p eth1 -b brvx-1001 -o add
```
"bridge fdb" shows
```
00:00:00:00:00:00 dev vxlan1001 dst 239.0.3.233 via eth1 self permanent
```
(2) this works:
```
/usr/share/cloudstack-common/scripts/vm/network/vnet/modifyvxlan.sh -v 1001 -p cloudbr1 -b brvx-1001 -o add
```
"bridge fdb" shows
```
00:00:00:00:00:00 dev vxlan1001 dst 239.0.3.233 via cloudbr1 self permanent
```
* vxlan: fix issue if kvm network label is not set
* Fix of some UEFI related issues
1 - fix of attach/detach ISO of VM with UEFI boot type
2 - if OS type of an ISO is categorized as "Other" the bus type of the disk
will be set to "sata"
* Simplify the validation of OS types
Datastore cluster as a primary storage support is already there. But if any changes at vCenter to datastore cluster like addition/removal of datastore is not synchronised with CloudStack directly. It needs removal of primary storage from CloudStack and add it again to CloudStack.
Here synchronisation of datastore cluster is fixed without need to remove or add the datastore cluster.
1. A new API is introduced syncStoragePool which takes datastore cluster storage pool UUID as the parameter. This API checks if there any changes in the datastore cluster and updates management server accordingly.
2. During synchronisation if a new child datastore is found in datastore cluster, then management server will create a new child storage pool in database under the datastore cluster. If the new child storage pool is already added as an individual storage pool then the existing storage pool entry will be converted to child storage pool (instead of creating a new storage pool entry)
3. During synchronisaton if the existing child datastore in CloudStack is found to be removed on vCenter then management server removes that child datastore from datastore cluster and makes it an individual storage pool.
The above behaviour is on par with the vCenter behaviour when adding and removing child datastore.
This PR fixes#4244
deploying of VMs from ISOs and from templates with UEFI boot type
deploying of VMs from ISOs and from templates with UEFI boot type with
volumes in RAW format
This PR aims at introducing persistence mode in L2 networks and enhancing the behavior in Isolated networks
Doc PR apache/cloudstack-documentation#183
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
* Updated libvirt's native reboot operation for VM on KVM using ACPI event, and Added 'forced' reboot option to stop and start the VM (using rebootVirtualMachine API)
* Added 'forced' reboot option for System VM and Router
- New parameter 'forced' in rebootSystemVm API, to stop and then start System VM
- New parameter 'forced' in rebootRouter API, to force stop and then start Router
* Added force reboot tests for User VM, System VM and Router
These changes are related to PR #3194, but include suspending/resuming the VM when doing a VM snapshot as well, when deleting a VM snapshot, as it is performing the same operations via Libvirt. Also, there was an issue with the UI/localization changes in the prior PR, as that PR was altering the Volume snapshot behavior, but was altering the VM snapshot wording. Both have been altered in this PR.
Issuing this in response to the work happening in PR #4029.
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ
Documentation PR: apache/cloudstack-documentation#169
This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack
Other improvements addressed in addition to PowerFlex/ScaleIO support:
- Added support for config drives in host cache for KVM
=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
- Updated full deployment destination for preparing the network(s) on VM start
- Propagate the direct download certificates uploaded to the newly added KVM hosts
- Discover the template size for direct download templates using any available host from the zones specified on template registration
=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
- Retry VM deployment/start when the host cannot grant access to volume/template
- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
- Check the router filesystem is writable or not, before performing health checks
=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)
* Addressed some issues for few operations on PowerFlex storage pool.
- Updated migration volume operation to sync the status and wait for migration to complete.
- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.
- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).
- Updated resize volume error message string.
- Blocked the below operations on PowerFlex storage pool:
-> Extract Volume
-> Create Snapshot for VMSnapshot
* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.
- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html
Other fixes included:
- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)
- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.
- Perform basic tests (for connectivity and file system) on router before updating the health check config data
=> Validate the basic tests (connectivity and file system check) on router
=> Cleanup the health check results when router is destroyed
* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0
* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool
and Minor improvements in PowerFlex/ScaleIO storage plugin code
* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.
- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.
* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py
* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)
* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.
* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure
* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
In previous cloudstack versions, qcow2 image does not have a backing file format.
however, it is required in newer qemu versions, for example qemu 4.2 on ubuntu 20.04.
steps to reproduce the issue
(1) install cloudstack 4.14 or previous version, and ubuntu 19.04 or 18.04/16.04 LTS.
(2) create vms.
(3) upgrade to 4.15, upgrade os to ubuntu 20.04 , or install a new server with ubuntu 20.04.
(4) migrate vm from old ubuntu version to ubuntu 20.04, failed with exception below
```
2021-02-04 13:43:07,397 DEBUG [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:93da9385) ExecutionException : org.libvirt.LibvirtException: Requested operation is not valid: format of backing image '/mnt/03b6f487-9eaf-38bf-ad2d-d985423b832f/66990fcc-fd98-4932-9649-989bf6583d59' of image '/mnt/03b6f487-9eaf-38bf-ad2d-d985423b832f/a3dd1f0f-2557-4e07-951c-e4eb7b3f38b2' was not specified in the image metadata (See https://libvirt.org/kbase/backing_chains.html for troubleshooting)
```
(5)stop vm, and start it on ubuntu 20.04 server. failed with exception below
```
2021-02-04 13:46:29,766 WARN [resource.wrapper.LibvirtStartCommandWrapper] (agentRequest-Handler-5:null) (logid:b54745a7) LibvirtException
org.libvirt.LibvirtException: Requested operation is not valid: format of backing image '/mnt/03b6f487-9eaf-38bf-ad2d-d985423b832f/66990fcc-fd98-4932-9649-989bf6583d59' of image '/mnt/03b6f487-9eaf-38bf-ad2d-d985423b832f/a3dd1f0f-2557-4e07-951c-e4eb7b3f38b2' was not specified in the image metadata (See https://libvirt.org/kbase/backing_chains.html for troubleshooting)
```
To make testing easier, step 1 and 2 can be replaced by
```
qemu-img create -f qcow2 -b <backing file> <qcow2 image>
```
so qcow2 image does not have a backing file format.
* 4.15:
server: select root disk based on user input during vm import (#4591)
kvm: Use Q35 chipset for UEFI x86_64 (#4576)
server: fix wrong error message when create isolated network without SourceNat (#4624)
server: add possibility to scale vm to current customer offerings (#4622)
server: keep networks order and ips while move a vm with multiple networks (#4602)
server: throw exception when update vm nic on L2 network (#4625)
doc: fix typo in install notes (#4633)
* 4.14:
server: select root disk based on user input during vm import (#4591)
kvm: Use Q35 chipset for UEFI x86_64 (#4576)
server: fix wrong error message when create isolated network without SourceNat (#4624)
server: add possibility to scale vm to current customer offerings (#4622)
server: keep networks order and ips while move a vm with multiple networks (#4602)
server: throw exception when update vm nic on L2 network (#4625)
doc: fix typo in install notes (#4633)
The bus type to `data disk` volumes is hardcoded to `virtio` or `scsi`, when using virtio-scsi (or, based on the template type). Therefore, there is no way to specify the bus type to data disk volumes (as we have for root disks).
This PR intends to replicate the `rootDiskController` behavior to `dataDiskController`, allowing the definition of the controller.
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
Add RBD main storage through UI, it will fail when there is no host port parameter;
Because when we created the pool, we did not add the port target in the xml
This fixes issue introduced in c3554ec31d
which enable block of code that will double escape rados host/monitor
port.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR fixes a regression issue in #4497
In cloudstack 4.14 or before, the cpu topology is set only when cpucore per socket is set (to 4 or 6).
in other conditions, there is no cpu topology in vm xml definition.
with #4497, vm will have cpu topology in its xml definition, if cpucore per socket is not set.
<topology sockets='<vm cpu cores>' cores='1' threads='1'/>
Not sure if it causes any issue. I think it would be better not to add this part in vm xml definition if cpucore per socket is not set.
Added property to agent.properties that enables or disables the iscsi session clean up feature. #4210
Added a condition to prevent disk partitions from being cleaned up. #4216
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
* DB : Add support for MySQL 8
- Splits commands to create user and grant access on database, the old
statement is no longer supported by MySQL 8.x
- `NO_AUTO_CREATE_USER` is no longer supported by MySQL 8.x so remove
that from db.properties conn parameters
For mysql-server 8.x setup the following changes were added/tested to
make it work with CloudStack in /etc/mysql/mysql.conf.d/mysqld.cnf and
then restart the mysql-server process:
server_id = 1
sql-mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_ENGINE_SUBSTITUTION"
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=1000
log-bin=mysql-bin
binlog-format = 'ROW'
default-authentication-plugin=mysql_native_password
Notice the last line above, this is to reset the old password based
authentication used by MySQL 5.x.
Developers can set empty password as follows:
> sudo mysql -u root
ALTER USER 'root'@'localhost' IDENTIFIED BY '';
In libvirt repository, there are two related commits
2019-08-23 13:13 Daniel P. Berrangé ● rpm: don't enable socket activation in upgrade if --listen present
2019-08-22 14:52 Daniel P. Berrangé ● remote: forbid the --listen arg when systemd socket activation
In libvirt.spec.in
/bin/systemctl mask libvirtd.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-ro.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-admin.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-tls.socket >/dev/null 2>&1 || :
/bin/systemctl mask libvirtd-tcp.socket >/dev/null 2>&1 || :
Co-authored-by: Wei Zhou <w.zhou@global.leaseweb.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.
Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }
The KB MB and GB values will be printed out:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
Ceph used to use port 6789 (no need to specify it), but with the messenger v2
from Ceph it switched to port 3300 while 6789 still works.
librados/librbd/libvirt will automatically figure out the ports to use if none is
specified.
Therefor there is no need for CloudStack to explicitely define the port in the XML
passed to Libvirt or Qemu.
Leave blank if no port number has been defined by the user.
When you migrate volume between data stores CS keeps the original UUID and changes the path of the volume.
When volume is not found by the given path the agent throws CloudRuntimeException but it's not catched in LibvirtGetVolumeStatsCommandWrapper.java
* 4.13:
Snapshot deletion issues (#3969)
server: Cannot list affinity group if there are hosts dedicated… (#4025)
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002)
* Fixes snapshot deletion
* Remove legacy '@Component', it is not necessary in this bean/class.
* Fix log message missing %d and remove snapshot on DB
* Remove "dummy" boolean return statement
* Manage snapshot deletion for KVM + NFS (primary storage)
* checkstyle trailing spaces
* rename options strings to *_OPTION
* Fix typo on deleteSnapshotOnSecondaryStorage and enhance log message
* Move the snapshotDao.remove(snapshotId); (#4006)
* Fix deletesnapshot worflow to handle both snapshots created in primary storage and snapshots backed up to secondary storage
* Fix extra space
* refactor out separate handling methods for secondary and primary (reducing returns)
* return false on unexpected error or log when expected
* != instead of ==
* secondary instead of backup storage
* init to null
* Handle snapshot deletion on primary storage. When primary store ref not found for snapshot do not fail the operation.
* Fix debug levels on log messages
Co-authored-by: GabrielBrascher <gabriel@apache.org>
Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com>
Co-authored-by: Harikrishna Patnala <harikrishna.patnala@gmail.com>
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* Remove constraint for NFS storage
* Add new property on agent.properties
* Add free disk space on the host prior template download
* Add unit tests for the free space check
* Fix free space check - retrieve avaiable size in bytes
* Update default location for direct download
* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope
* Verify location for temporary download exists before checking free space
* In progress - refactor and extension
* Refactor and fix
* Last fixes and marvin tests
* Remove unused test file
* Improve logging
* Change default path for direct download
* Fix upload certificate
* Fix ISO failure after retry
* Fix metalink filename mismatch error
* Fix iso direct download
* Fix for direct download ISOs on local storage and shared mount point
* Last fix iso
* Fix VM migration with ISO
* Refactor volume migration to remove secondary storage intermediate
* Fix simulator issue
This adds support for JDK11 in CloudStack 4.14+:
- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* [CLOUDSTACK-10408] Fix String.replaceAll() to replace() for better performance
* improve with replace char but string
Co-authored-by: Rohit Yadav <rohit@apache.org>
* * Complete API implementation
* Complete UI integration
* Complete marvin test
* Complete Secondary storage GC background task
* improve UI labels
* slight reword and add another missing description
* improve download message clarity
* Address comments
* multiple fixes and cleanups
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* fix more bugs, let it return ip rule list in another log file
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* fix missing iprule bug
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* add support for ARCHIVE type of object to be linked/setup on secstorage
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Fix retrieving files for Xenserver
* Update get_diagnostics_files.py
* Fix bug where executable scripts weren't handled
* Fixed error on script cmd generation
* Do not filter name for log files as it would override similar prefix script names
* Addressed code review comments
* log error instead of printstacktrace
* Treat script as executable and shell script
* Check missing script name case and write to output instead of catching exception
* Use shell = true instead of shlex to support any executable
* fix xenserver bug
* don't set dir permission for vmware
* Code review comments - refactoring
* Add check for possible NPE
* Remove unused imoprt after rebase
* Add better description for configs
Co-authored-by: Nicolas Vazquez <nicovazquez90@gmail.com>
Co-authored-by: Rohit Yadav <rohit@apache.org>
Co-authored-by: Anurag Awasthi <anurag.awasthi@shapeblue.com>
* Suqash commits to a single commit and rebase against master
Update marvin tests to use white list
* * Fix marvin test failure
* Add new marvin negative tests cases
* Remove hard-coded hypervisor types in marvin tests
* Fix build error after rebase and add hugepagesless
* Fix readability of python code
* Fix failing test
* Adding cleanup of vms for negative tests
* Bug fixes - change config checks properly and block extraconfig in details
* Trim to compare the keys
* CR comments
* Don't skip extraconfig without exception
Co-authored-by: Boris Stoyanov - a.k.a Bobby <bss.stoyanov@gmail.com>
* Avgload (#2)
* Adding avgload for kvm
* Fix coding style issue
* Add getter/setter
* Fix several small errors
* Add override
* Uncomment getAverageLoad
* Override getAverageLoad()
* Checkstyle bug?
* Delete trailing spaces
* Renaming function
* Change interface to match
* Rename method in GetHostStatsAnswer
* Change method call name
* Convert double to long
* Remove trailing whitespace
* Change names around
* Make load visible to return it
* Parse string to double
* Change Long to Double
* Fix getter
* Unify naming to cpuloadaverage
* Change cpuloadaverage String to Double in listHostsMetrics
Remove some unnecessary whitespaces
* Add CPU_LOAD_AVERAGE to ApiConstants
When I add a secondary IP to a nic on shared network in advanced zone with security groups, the network rules for new IP are not applied on KVM hypervisors.
It is because "--action -A" cannot be recognized in security_group.py after commit ac73e7e671. changing to "--action=-A" will fix it.
Fixes issue #3590 by using the last element on the array from the snapshot "path" String for retrieving the snapshot id. Additionally, it uses the volumePath as the volume id which should always be the correct value. The error raised on issue #3590 was related to the wrong use of variable "path" where in some cases had a different set of substrings.
The proposed change has been tested and evaluated. The values used for openning the RBD connection and executing the rollback were stable on the tests. Runned rollback on multiple snapshots and could start the VM with the content matching the ROOT reverted snapshot.
KVM is supported on arm64 Linux (https://www.linux-kvm.org/page/Processor_support#ARM:).
For a small (IoT) platform such as the new Raspberry Pi 4 that uses armv8 processor
(cortex-a72) it's possible to run Linux host with `/dev/kvm`
accleration. This adds support for IoT IaaS in CloudStack.
This PR is from a fun weekend project where:
- I set up a Raspberry Pi 4 - 4GB RAM model with 4 CPU cores @ 1.5Ghz, 128GB SD samsung evo plus card
- Installed Ubuntu 19.10 raspi3 base image: http://cdimage.ubuntu.com/releases/19.10/release/ubuntu-19.10-preinstalled-server-arm64+raspi3.img.xz
- Build a custom Linux 5.3 kernel with KVM enabled, deb here: http://dl.rohityadav.cloud/cloudstack-rpi/kernel-19.10/ and install the linux-image and linux-module
- Then install/setup CloudStack on it (fix some issues around jna, by manually installing newer libjna-java to /usr/share/cloudstack-agent/lib)
- Since the host processor is not x86_64, I had to build a new arm64 (or aarch64) systemvmtemplate: http://dl.rohityadav.cloud/cloudstack-rpi/systemvmtemplate/
I could finally get a 4.13 CloudStack + Adv zone/networking to run on it
and deployed a KVM based Ubuntu 19.10 environment and NFS storage.
Deployed a test vm with isolated network, VR works as expected. Console
proxy works as well, for this tested against arm64 openstack Debian 9/10
templates.
I raised the issue of enabling KVM in upstream Ubuntu arm64 build: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1783961
Ubuntu kernel team has come back and future arm64 releases may have
KVM enabled by default.
Limitation: on my aarch64 env, it did not support IDE, therefore all
default bus type for volumes are SCSI by default. With VIRTIO it fails
sometimes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* kvm: Use 'ip' instead of 'brctl'
The command 'brctl' is deprecated and should no longer be used.
iproute2 supports all the features we need and therefor we should use
this instead of the old commands.
Feature wise this does not change anything. It just makes the code more
robust towards the future.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* kvm/modifyvlan: Use 'ip' instead of 'brctl'
brctl is deprecated and by using iproute2 we are future-proof
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Fix regression bug that affects KVM local storage migration. Some of the desired execution flows for KVM local storage migration had been altered to allow only managed storage to execute. Fixed allowing managed and non managed storages to execute.
Fixes#3521
There are certain scenarios where the 169.254.0.0/16 subnet is used for different
purposes then CloudStack on a hypervisor.
Once of such scenarios is a BGP+EVPN+VXLAN setup using BGP Unnumbered where the
169.254.0.1 address is used by Frr/Zebra BGP routing to send traffic to the
neighboring router.
The following settings can be changed in the agent.properties (default values added):
control.cidr=169.254.0.0/16
Make sure the global setting 'control.cidr' matches the values defined in the agent.propeties!
In the future the mgmt server can send this parameter to a KVM Agent on startup, but at the moment
this framework is not in place and thus these values can't be send to the Agent in a proper manner.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Currently when refreshing disk usage stats all kvm agents are asked to collect stats for all volumes. In setups with multiple kvm hosts where managed storage is used, not all volumes are attached to all kvm hosts, this results in a large number of warnings in the kvm agent logs. This change introduces a filter step in case managed storage is used so that the management server only requests kvm agents for stats about volumes that are connected to each kvm host.
Add CephSnapshotStrategy to handle RBD revert (rollback) snapshot. In order to support RBD revert (rbd_rollback), this PR adds a CephSnapshotStrategy class to handle Ceph/RBD snapshot actions.
* Add revoke certificates API
* Add background task to sync certificates
* Fix marvin test and revoke certificate
* Fix certificate sent to hypervisor was missing headers
* Fix background task for uploading certificates to hosts
This change addresses #3089. There was an issue when disks were being added with bus type IDE when creating windows VMs from ISOs. It is not possible to select bus type when creating a VM with an ISO. The bus type is inferred based on the platform emulator string provided to the KVM agent. Currently when creating a VM with managed storage (ex: Solidfire) and OS type string Windows*, all disks are added as IDE. Qemu currently does not support multiple IDE controllers and this configuration results in VMs that cannot be started. This issue does not occur when using NFS as the storage provider due to logic in that KVM agent that makes all data volumes (non root) use a virtio controller for file based disk. Similar logic was added for raw physical disks so that managed storage has the same behavior as NFS. In addition specific versions were removed from the code that guesses the disk controller to be used based on the platform emulator string since most modern operating systems support virtio.
Fixes#3089
Feature Specification: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95653548
Live storage migration on KVM under these conditions:
From source and destination hosts within the same cluster
From NFS primary storage to NFS cluster-wide primary storage
Source NFS and destination NFS storage mounted on hosts
In order to enable this functionality, database should be updated in order to enable live storage capacibilty for KVM, if previous conditions are met. This is due to existing conflicts between qemu and libvirt versions. This has been tested on CentOS 6 hosts.
Additional notes:
To use this feature set the storage_motion_supported=1 in the hypervisor_capability table for KVM. This is done by default as the feature may not work in some environments, read below.
This feature of online storage+VM migration for KVM will only work with CentOS6 and possible Ubuntu as KVM hosts but not with CentOS7 due to:
https://bugs.centos.org/view.php?id=14026https://bugzilla.redhat.com/show_bug.cgi?id=1219541
On CentOS7 the error we see is: " error: unable to execute QEMU command 'migrate': this feature or command is not currently supported" (reference https://ask.openstack.org/en/question/94186/live-migration-unable-to-execute-qemu-command-migrate/). Reading through various lists looks like the migrate feature with qemu may be available with paid versions of RHEL-EV but not centos7 however this works with CentOS6.
Fix for CentOS 7:
Create repo file on /etc/yum.repos.d/:
[qemu-kvm-rhev]
name=oVirt rebuilds of qemu-kvm-rhev
baseurl=http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/
mirrorlist=http://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-3.5-el7Server
enabled=1
skip_if_unavailable=1
gpgcheck=0
yum install qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64 qemu-kvm-ev-2.3.0-29.1.el7.x86_64 qemu-img-ev-2.3.0-29.1.el7.x86_64
Reboot host
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
To make sure that a qemu2-image won't be corrupted by the snapshot deletion procedure which is being performed after copying the snapshot to a secondary store, I'd propose to put a VM in to suspended state.
Additional reference: https://bugzilla.redhat.com/show_bug.cgi?id=920020#c5Fixes#3193
* Improvements on upload direct download certificates
* Move upload direct download certificate logic to KVM plugin
* Extend unit test certificate expiration days
* Add marvin tests and command to revoke certificates
* Review comments
* Do not include revoke certificates API
* DPDK vHost User mode selection
* SQL text field and DPDK classes refactor
* Fix NullPointerException after refactor
* Fix unit test
* Refactor details type
When I use SandyBridge as custom cpu in my testing, vm failed to start due to following error:
```
org.libvirt.LibvirtException: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: avx, xsave, aes, tsc-deadline, x2apic, pclmuldq
```
With this patch, it works with the following setting in agent.properties:
```
guest.cpu.mode=custom
guest.cpu.model=SandyBridge
guest.cpu.features=-avx -xsave -aes -tsc-deadline -x2apic -pclmuldq
```
vm cpu is defined as below:
```
<cpu mode='custom' match='exact'>
<model fallback='allow'>SandyBridge</model>
<feature policy='disable' name='avx'/>
<feature policy='disable' name='xsave'/>
<feature policy='disable' name='aes'/>
<feature policy='disable' name='tsc-deadline'/>
<feature policy='disable' name='x2apic'/>
<feature policy='disable' name='pclmuldq'/>
</cpu>
```
On first startup, the management server creates and saves a random
ssh keypair using ssh-keygen in the database. The command does
not specify keys in PEM format which is not the default as generated
by latest ssh-keygen tool.
The systemvmtemplate always needs re-building whenever there is a change
in the cloud-early-config file. This also tries to fix that by introducing a
stage 2 bootstrap.sh where the changes specific to hypervisor detection
etc are refactored/moved. The initial cloud-early-config only patches
before the other scripts are called.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Fixes PR #3146 db cleanup to the correct 4.12->4.13 upgrade path
- Fixes failing unit test due to jdk specific changes after forward
merging
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new patching script for patching systemvms on KVM
using qemu-guest-agent that runs inside the systemvm on startup. This
also removes the vport device which was previously used by the legacy
patching script and instead uses the modern and new uniform guest
agent vport for host-guest communication.
Also updates the sytemvmtemplate build config to use the latest Debian
9.9.0 iso.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* ubuntu16: fix unable to add host if cloudbrX is not configured
while add a ubuntu16.04 host with native eth0 (cloudbrX is not configured),
the operation failed and I got the following error in /var/log/cloudstack/agent/setup.log
```
DEBUG:root:execute:ifconfig eth0
DEBUG:root:[Errno 2] No such file or directory
File "/usr/lib/python2.7/dist-packages/cloudutils/serviceConfig.py", line 38, in configration
result = self.config()
File "/usr/lib/python2.7/dist-packages/cloudutils/serviceConfig.py", line 211, in config
super(networkConfigUbuntu, self).cfgNetwork()
File "/usr/lib/python2.7/dist-packages/cloudutils/serviceConfig.py", line 108, in cfgNetwork
device = self.netcfg.getDefaultNetwork()
File "/usr/lib/python2.7/dist-packages/cloudutils/networkConfig.py", line 53, in getDefaultNetwork
pdi = networkConfig.getDevInfo(dev)
File "/usr/lib/python2.7/dist-packages/cloudutils/networkConfig.py", line 157, in getDevInfo
elif networkConfig.isBridge(dev) or networkConfig.isOvsBridge(dev):
```
The issue is caused by commit 9c7cd8c248
2017-09-19 16:45 Sigert Goeminne ● CLOUDSTACK-10081: CloudUtils getDevInfo function will now return "bridge" instead o
* ubuntu16: Stop service libvirt-bin.socket while add a host
service libvirt-bin.socket will be started when add a ubuntu 16.04 host
DEBUG:root:execute:sudo /usr/sbin/service libvirt-bin start
However, libvirt-bin service will be broken by it after restarting
Stopping service libvirt-bin.socket will fix the issue.
An example is given as below.
```
root@node32:~# /etc/init.d/libvirt-bin restart
[ ok ] Restarting libvirt-bin (via systemctl): libvirt-bin.service.
root@node32:~# virsh list
error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
root@node32:~# systemctl stop libvirt-bin.socket
root@node32:~# /etc/init.d/libvirt-bin restart
[ ok ] Restarting libvirt-bin (via systemctl): libvirt-bin.service.
root@node32:~# virsh list
Id Name State
----------------------------------------------------
```
* ubuntu16: Diable libvirt default network
By default, libvirt will create default network virbr0 on kvm hypervisors.
If vm uses the same ip range 192.168.122.0/24, there will be some issues.
In some cases, if we run tcpdump inside vm, we will see the ip of kvm hypervisor as source ip.
* Mock Scanner, instead of scan the computer running the test.
This allows non linux machines to run the tests without scanning for a
non existing /proc/meminfo.
* test fixes on 'other' platforms libvirt wrapper unit tests (#3)
* Keep iotune section in the VM's XML after live migration
When live migrating a KVM VM among local storages, the VM loses the
<iotune> section on its XML, therefore, having no IO limitations.
This commit removes the piece of code that deletes the <iotune> section
in the XML.
* Add test for replaceStorage in LibvirtMigrateCommandWrapper
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* Fix Javadoc for method replaceIpForVNCInDescFile
* feature: add libvirt / qemu io bursting
Adds the ability to set bursting features from libvirt / qemu
This allows you to utilize the iops and bytes temporary "burst" mode
introduced with libvirt 2.4 and improved upon with libvirt 2.6.
https://blogs.igalia.com/berto/2016/05/24/io-bursts-with-qemu-2-6/
* updates per rafael et al
The KVM Agent had two mechanisms for reporting its capabilities
and memory to the Management Server.
On startup it would ask libvirt the amount of Memory the Host has
and subtract and add the reserved and overcommit memory.
When the HostStats were however reported to the Management Server
these two configured values on the Agent were no longer reported
in the statistics thus showing all the available memory in the
Agent/Host to the Management Server.
This commit unifies this by using the same logic on Agent Startup
and during statistics reporting.
memory=3069636608, reservedMemory=1073741824
This was reported by a 4GB Hypervisor with this setting:
host.reserved.mem.mb=1024
The GUI (thus API) would then show:
Memory Total 2.86 GB
This way the Agent properly 'lies' to the Management Server about its
capabilities in terms of Memory.
This is very helpful if you want to overprovision or undercommit machines
for various reasons.
Overcommitting can be done when KSM or ZSwap or a fast SWAP device is
installed in the machine.
Underprovisioning is done when the Host might run other tasks then a KVM
hypervisor, for example when it runs in a hyperconverged setup with Ceph.
In addition internally many values have been changed from a Double to a Long
and also store the amount of bytes instead of Kilobytes.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: Replace deprecated optparse by argparse
Starting with Python 2.7 the library optparse has been replaced by
argpase.
This commit replaces the use of optparse by argparse
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: Remove LXC support from security_group.py
LXC does not work and has been partially removed from CloudStack already
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: Refactor libvirt code
Use a single function which properly throws an Exception when the
connection to libvirt fails.
Also simplify some logic, make it PEP-8 compatible and remove a unused
function from the code.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: Raise Exception on execute() failure
If the executed command exists with a non-zero exit status we should
still return the output to the command, but also raise an Exception.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: Use a function to determin the physical device of a bridge
We can not safely assume that the first device listed under a bridge is the
physical device.
With VXLAN isolation a vnet device can be attached to a bridge prior to the
vxlanXXXX device being attached.
We need to filter out those devices and then fetch the physical device attached
to the bridge.
In addition use the 'bridge' command instead of 'brctl'. 'bridge' is part of the
iproute2 utils just like 'ip' and should be considered as the new default.
This command is also available on EL6 and does not break any backwards compat.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: --set is deprecated, use --match-set
These messages are seen in the KVM Agent log:
--set option deprecated, please use --match-set
Functionality does not change
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* security group: PEP-8 and indentation fixes
There were a lot of styling problems in the code:
- Missing whitespace or exess whitespace
- CaMelCaSe function names and variables
- 2-space indentation instead of 4 spaces
This commit addresses those issues.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
The additional queues can enhance the performance of the VirtIO SCSI disk
and it is recommended to set this to the amount of vCPUs a Instance is assigned.
The optional queues attribute specifies the number of queues for the
controller. For best performance, it's recommended to specify a value matching
the number of vCPUs. Since 1.0.5 (QEMU and KVM only)
Source: https://libvirt.org/formatdomain.html#elementsVirtio
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* Allow KVM VM live migration with ROOT volume on file
* Allow KVM VM live migration with ROOT volume on file
- Add JUnit tests
* Address reviewers and change some variable names to ease future
implementation (developers can easily guess the name and use
autocomplete)
Added dummy and lo devices to be treated as a normal bridge slave devs.
Fixes#2998
Added two more device names (lo* and dummy*). Implemented tests. Code was refactored.
Improved paths concatenation code from "+" to Paths.get.
If a host has many routes this can be a magnitude faster then printing
all the routes and grepping for the default.
In some situations the host might have a large amount of routes due to
dynamic routing being used like OSPF or BGP.
In addition fix a couple of loglines which were throwing messages on
DEBUG while WARN and ERROR should be used there.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
When vxlan://untagged is used for public (or guest) network, use the
default public/guest bridge device same as how vlan://untagged works.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
These additional RBD features allow for faster lookups of how much space a RBD
image is using, but with the exclusive locking we prevent two VMs from writing
to the same RBD image at the same time.
These are the default features used by Ceph for any new RBD image.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Prevents errors while migrating VM from ISO:
Test 1: Deploy VM from ISO -> Live migrate VM to another host -> ERROR
Test 2: Register ISO using Direct Download on KVM -> Deploy VM from ISO -> Live migrate VM to another host -> ERROR
- Prevent NullPointerException migrating VM from ISO
- Prevent mount secondary storage on ISO direct downloads on KVM
Since we support only Ubuntu 16.04+ on master/4.12+, we can now use
the libvirt service name `libvirtd` for all distributions. This also
fixes an optional package name for libvirtd installation on Debian 9+.
Fixes#2909
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Windows has support for several paravirt features that it will use when running on Hyper-V, Microsoft's hypervisor. These features are called enlightenments. Many of the features are similar to paravirt functionality that exists with Linux on KVM (virtio, kvmclock, PV EOI, etc.)
Nowadays QEMU/KVM can also enable support for several Hyper-V enlightenments. When enabled, Windows VMs running on KVM will use many of the same paravirt optimizations they would use when running on Hyper-V.
A number of years ago, a PR was introduced that added a good portion of the code to enable this feature set, but it was never completed. This PR enables the existing features. The previous patch set detailed in #1013 also included the tests.
By selecting Windows PV, the enlightenment additions will be applied to the libvirt configuration. This is support on Windows Server 2008 and beyond, so all currently supported versions of Windows Server.
In our testing, we've seen benchmark improvements of around 20-25% running on Centos 7 hosts and it is also supported on Centos/RHEL 6.5 and later. Testing on Ubuntu would be appreciated.
When a Instance is (attempted to be) started in KVM Host the Agent
should not worry about the allocated memory on this host.
To make a proper judgement we need to take more into account:
- Memory Overcommit ratio
- Host reserved memory
- Host overcommit memory
The Management Server has all the information and the DeploymentPlanner
has to make the decision if a Instance should and can be started on a
Host, not the host itself.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This fixes#2763 by moving a post cert-renewal class for kvm
plugin/hypervisor to src/main/java. The regression is due to change
in file-system layout due to maven standard refactoring on master and
issue was not caught during forward-merging of a PR from 4.11 branch.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This ensure that fewer mount points are made on hosts for either
primary storagepools or secondary storagepools.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Now the KVM agent checks whether a storage pool is mounted or not mounted before calling storagePoolCreateXML().
Signed-off-by: Kai Takahashi <k-takahashi@creationline.com>
This introduces a new global setting `vm.configdrive.primarypool.enabled` to toggle creation/hosting of config drive iso files on primary storage, the default will be false causing them to be hosted on secondary storage. The current support is limited from hypervisor resource side and in current implementation limited to `KVM` only. The next big change is that config drive is created at a temporary location by management server and shipped to either KVM or SSVM agent via cmd-answer pattern, the data of which is not logged in logs. This saves us from adding genisoimage dependency on cloudstack-agent pkg.
The APIs to reset ssh public key, password and user-data (via update VM API) requires that VM should be shutdown. Therefore, in the refactoring I removed the case of updation of existing ISO. If there are objections I'll re-put the strategy to detach+attach new config iso as a way of updation. In the refactored implementation, the folder name is changed to lower-cased configdrive. And during VM start, migration or shutdown/removal if primary storage is enable for use, the KVM agent will handle cleanup tasks otherwise SSVM agent will handle them.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In 4.11.0, I added the ability to online migrate volumes from NFS to managed storage. This actually works for Ceph to managed storage in a private 4.8 branch, as well. I thought I had brought along all of the necessary code from that private 4.8 branch to make Ceph to managed storage functional in 4.11.0, but missed one piece (which is fixed by this PR).
This adds and allows Ubuntu 18.04 to be used as KVM host. In addition,
on the UI when hypervisor version key is missing, this adds and display
the host os and version detail which is useful to show the KVM host
os and version.
When cache mode 'none' is used for empty cdrom drives, systemvms
and guest VMs fail to start on newer libvirtd such as Ubuntu bionic.
The fix is ensure that cachemode is not declared when drives are empty
upon starting of the VM. Similar issue logged at redhat here:
https://bugzilla.redhat.com/show_bug.cgi?id=1342999
The workaround is to ensure that we don't configure cachemode for
cdrom devices at all. This also fixes live VM migration issue.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The three methods are named as "setXXX", actually, they are not simple setter or getter.
They are further renamed as "generateXXX" with dahn's comments.
* fix https://issues.apache.org/jira/browse/CLOUDSTACK-10356
* del patch file
* Update ResourceCountDaoImpl.java
* fix some format
* fix code
* fix error message in VolumeOrchestrator
* add check null stmt
* del import unuse class
* use BooleanUtils to check Boolean
* fix error message
* delete unuse function
* delete the deprecated function updateDomainCount
* add error log and throw exception in ProjectManagerImpl.java
This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.
Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.
Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.
The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:
listen_tcp=0
listen_tls=1
tcp_port="16509"
tls_port="16514"
auth_tcp="none"
auth_tls="none"
key_file = "/etc/pki/libvirt/private/serverkey.pem"
cert_file = "/etc/pki/libvirt/servercert.pem"
ca_file = "/etc/pki/CA/cacert.pem"
For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.
There are no API or DB changes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Several fixes addressed:
- Dettach ISO fails when trying to detach a direct download ISO
- Fix for metalink support on SSVM agents (this closes CLOUDSTACK-10238)
- Reinstall VM from bypassed registered template (this closes CLOUDSTACK-10250)
- Fix upload certificate error message even though operation was successful
- Fix metalink download, checksum retry logic and metalink SSVM downloader
There is a race condition in the monitoring of the migration process on KVM. If the monitor wakes up in the tight window after the migration succeeds, but before the migration thread terminates, the monitor will get a LibvirtException “Domain not found: no domain with matching uuid” when checking on the migration status. This in turn causes CloudStack to sync the VM state to stop, in which it issues a defensive StopCommand to ensure it is correctly synced.
Fix: Prevent LibvirtException: "Domain not found" caused by the call to dm.getInfo()
This fixes move refactoring error introduced in #2283
For instance, the class DatadiskTO is supposed to be in com.cloud.agent.api.to package. However, the folder structure it was placed in is com.cloud.agent.api.api.to.
Skip tests for cloud-plugin-hypervisor-ovm3:
For some unknown reason, there are quite a lot of broken test cases for cloud-plugin-hypervisor-ovm3. They might have appeared after some dependency upgrade and was overlooked by the person updating them. I checked them to see if they could be fixed, but these tests are not developed in a clear and clean manner. On top of that, we do not see (at least I) people using OVM3-hypervisor with ACS. Therefore, I decided to skip them.
Identention corrected to use spaces instead of tabs in XML files
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.
- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml
Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
The documentation of Libvirt specifies the requirement of using an XML namespace,
when having metadata in the Domain XML. The Nuage extenstion metadata was not
adhering to this specification, and the lastest Libvirt version ignores it in that case.
Allowed zone-wide primary storage based on a custom plug-in to be added via the GUI in a KVM-only environment (previously this only worked for XenServer and VMware)
Added support for root disks on managed storage with KVM
Added support for volume snapshots with managed storage on KVM
Enable creating a template directly from a volume (i.e. without having to go through a volume snapshot) on KVM with managed storage
Only allow the resizing of a volume for managed storage on KVM if the volume in question is either not attached to a VM or is attached to a VM in the Stopped state.
Included support for Reinstall VM on KVM with managed storage
Enabled offline migration on KVM from non-managed storage to managed storage and vice versa
Included support for online storage migration on KVM with managed storage (NFS and Ceph to managed storage)
Added support to download (extract) a managed-storage volume to a QCOW2 file
When uploading a file from outside of CloudStack to CloudStack, set the min and max IOPS, if applicable.
Included support for the KVM auto-convergence feature
The compression flag was actually added in version 1.0.3 (1000003) as opposed to version 1.3.0 (1003000) (changed this to reflect the correct version)
On KVM when using iSCSI-based managed storage, if the user shuts a VM down from the guest OS (as opposed to doing so from CloudStack), we need to pass to the KVM agent a list of applicable iSCSI volumes that need to be disconnected.
Added a new Global Setting: kvm.storage.live.migration.wait
For XenServer, added a check to enforce that only volumes from zone-wide managed storage can be storage motioned from a host in one cluster to a host in another cluster (cannot do so at the time being with volumes from cluster-scoped managed storage)
Don’t allow Storage XenMotion on a VM that has any managed-storage volume with one or more snapshots.
Enabled for managed storage with VMware: Template caching, create snapshot, delete snapshot, create volume from snapshot, and create template from snapshot
Added an SIOC API plug-in to support VMware SIOC
When starting a VM that uses managed storage in a cluster other than the one it last was running in, we need to remove the reference to the iSCSI volume from the original cluster.
Added the ability to revert a volume to a snapshot
Enabled cluster-scoped managed storage
Added support for VMware dynamic discovery
Extending Config Drive support
* Added support for VMware
* Build configdrive.iso on ssvm
* Added support for VPC and Isolated Networks
* Moved implementation to new Service Provider
* UI fix: add support for urlencoded userdata
* Add support for building systemvm behind a proxy
Co-Authored-By: Raf Smeets <raf.smeets@nuagenetworks.net>
Co-Authored-By: Frank Maximus <frank.maximus@nuagenetworks.net>
Co-Authored-By: Sigert Goeminne <sigert.goeminne@nuagenetworks.net>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
* Cleanup and Improve NetUtils
This class had many unused methods, inconsistent names and redundant code.
This commit cleans up code, renames a few methods and constants.
The global/account setting 'api.allowed.source.cidr.list' is set
to 0.0.0.0/0,::/0 by default preserve the current behavior and thus
allow API calls for accounts from all IPv4 and IPv6 subnets.
Users can set it to a comma-separated list of IPv4/IPv6 subnets to
restrict API calls for Admin accounts to certain parts of their network(s).
This is to improve Security. Should an attacker steal the Access/Secret key
of an account he/she still needs to be in a subnet from where accounts are
allowed to perform API calls.
This is a good security measure for APIs which are connected to the public internet.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
- Refactors and simplifies systemvm codebase file structures keeping
the same resultant systemvm.iso packaging
- Password server systemd script and new postinit script that runs
before sshd starts
- Fixes to keepalived and conntrackd config to make rVRs work again
- New /etc/issue featuring ascii based cloudmonkey logo/message and
systemvmtemplate version
- SystemVM python codebase linted and tested. Added pylint/pep to
Travis.
- iptables re-application fixes for non-VR systemvms.
- SystemVM template build fixes.
- Default secondary storage vm service offering boosted to have 2vCPUs
and RAM equal to console proxy.
- Fixes to several marvin based smoke tests, especially rVR related
tests. rVR tests to consider 3*advert_int+skew timeout before status
is checked.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This ports PR #1470 by @remibergsma.
Make the generated json files unique to prevent concurrency issues:
The json files now have UUIDs to prevent them from getting overwritten
before they've been executed. Prevents config to be pushed to the wrong
router.
2016-02-25 18:32:23,797 DEBUG [c.c.a.t.Request] (AgentManager-Handler-1:null) (logid:) Seq 2-4684025087442026584: Processing: { Ans: , MgmtId: 90520732674657, via: 2, Ver: v1, Flags: 10, [{"com.cloud.agent.api.routing.GroupA
nswer":{"results":["null - success: null","null - success: [INFO] update_config.py :: Processing incoming file => vm_dhcp_entry.json.4ea45061-2efb-4467-8eaa-db3d77fb0a7b\n[INFO] Processing JSON file vm_dhcp_entry.json.4ea4506
1-2efb-4467-8eaa-db3d77fb0a7b\n"],"result":true,"wait":0}}] }
On the router:
2016-02-25 18:32:23,416 merge.py __moveFile:298 Processed file written to /var/cache/cloud/processed/vm_dhcp_entry.json.4ea45061-2efb-4467-8eaa-db3d77fb0a7b.gz
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Otherwise we send down a 'null' to a ProcessBuilder in Java instead of a String and this
causes a NPE.
We should check first if the Instance has a IPv6 address before sending it there.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* CLOUDSTACK-10160: Fix typo in Libvirt XML definition for Virtio-SCSI
The attribute for the XML element 'controller' should be 'model' and
not 'mode'.
Source: https://libvirt.org/formatdomain.html#elementsControllers
A scsi controller has an optional attribute model, which is one of
'auto', 'buslogic', 'ibmvscsi', 'lsilogic', 'lsisas1068', 'lsisas1078',
'virtio-scsi' or 'vmpvscsi'.
In the current state a regular SCSI device is attached and not a Virtio-SCSI
device.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* CLOUDSTACK-10160: Add UnitTest for LibvirtVMDef.SCSIDef
To make sure the XML output string is correct
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This commit adds support for passing IPv6 Addresses and/or Subnets as
Secondary IPs.
This is groundwork for CLOUDSTACK-9853 where IPv6 Subnets have to be
allowed in the Security Groups of Instances to we can add DHCPv6
Prefix Delegation.
Use ; instead of : for separating addresses, otherwise it would cause
problems with IPv6 Addresses.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
* CLOUDSTACK-9972: Enhance listVolume API to include physical size and utilization.
Also fixed pool, cluster and pod info
* CLOUDSTACK-9972: Fix volume_view and duplicate API constant
* CLOUDSTACK-9972: Backport Do not allow vms to be deployed on hosts that are in disabled pod
* CLOUDSTACK-9972: Fix localization missing keys
* CLOUDSTACK-9972: Fix sql path
Commit enables a new feature for KVM hypervisor which purpose is to increase virtually amount of RAM available beyond the actual limit.
There is a new parameter in agent.properties: host.overcommit.mem.mb which enables adding specified amount of RAM to actually available. It is necessary to utilize KSM and ZSwap features which extend RAM with deduplication and compression.
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the Instance.
If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most commonly used.
vm.watchdog.model=i6300esb
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the Instance.
If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
<watchdog model='i6300esb' action='reset'/>
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most commonly used.
vm.watchdog.model=i6300esb
Signed-off-by: Wido den Hollander <wido@widodh.nl>
- Removed three bg thread tasks, uses FSM event-trigger based scheduling
- On successful recovery, kicks VM HA
- Improves overall HA scheduling and task submission, lower DB access
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Since libvirt 1.2.2 libvirt will properly create volumes
using RBD format 2.
We can use libvirt to creates the volumes which strips a bit of
code from the CloudStack Agent's responsbility.
RBD format 2 is already used by all volumes created by CloudStack.
This format is the most recent format of RBD and is still actively
being developed.
This removes the support for Ubuntu 12.04 as that does not have the
proper libvirt version available.
Signed-off-by: Wido den Hollander wido@widodh.nl
We can use libvirt to creates the volumes which strips a bit of
code from the CloudStack Agent's responsbility.
RBD format 2 is already used by all volumes created by CloudStack.
This format is the most recent format of RBD and is still actively
being developed.
This removes the support for Ubuntu 12.04 as that does not have the
proper libvirt version available.
Signed-off-by: Wido den Hollander <wido@widodh.nl>