* Host HA code improvements
* Fix to not cancel VM HA items when Host HA is enabled & inspection in progress, and some code improvements
- When Host HA inspection in progress, the investigor returns the Host Status as Up which cancels the VM HA items
- Don't cancel the VM HA items, instead reschedule them to try again later
* Changes to consider Recovered/Available Host HA state along with the agent connection status to determine the Host HA inspection in progress or not, and some code improvements
* Refactoring Allocator classes
* Break into smaller methods random and firfit allocators.
* Added unit tests for random and firstfit allocators
* Move random allocator from cloud-plugins to cloud-server
* Add BaseAllocator abstract class for duplicate code
* Add missing license
* Add missing license to unit test file
* Remove host allocator random dependency
* Change exception message on smoke tests
* Remove conditional as it was never actually reached in the original flow
* Fix tests
* Fix flipped parameters
* Fix NPE while listing hosts for migration when suitableHosts is null
* Remove unnecessary stubbings
* Fix checkstyle
* Remove unnecessary file
* Rename exception error messages
* Apply suggestions from code review
Co-authored-by: Fabricio Duarte <fabricio.duarte.jr@gmail.com>
* Rename UserVmDetailVO references to VMInstanceDetailVO
* Remove unused imports
* Add new line at EOF
* Remove unnecessary random allocator pom
* Fix GPU allocation mistake
* Fix failing tests
---------
Co-authored-by: Fabricio Duarte <fabricio.duarte@scclouds.com.br>
Co-authored-by: Fabricio Duarte <fabricio.duarte.jr@gmail.com>
* Linstor: fix create volume from snapshot on primary storage
When creating a volume from a snapshot on Linstor primary storage
(with lin.backup.snapshots=false), the operation fails with:
"Only the following image types are currently supported: VHD, OVA,
QCOW2, RAW (for PowerFlex and FiberChannel)"
Root cause: the Linstor driver does not handle SNAPSHOT -> VOLUME in
its canCopy()/copyAsync() methods. This causes DataMotionServiceImpl
to fall through to StorageSystemDataMotionStrategy (selected because
Linstor advertises STORAGE_SYSTEM_SNAPSHOT=true). That strategy's
verifyFormatWithPoolType() rejects RAW format for Linstor pools,
since RAW is only allowed for PowerFlex and FiberChannel.
Additionally, VolumeOrchestrator.createVolumeFromSnapshot() attempts
to back up the snapshot to secondary storage when the storage plugin
does not advertise CAN_CREATE_TEMPLATE_FROM_SNAPSHOT. This backup
fails because the snapshot only exists on Linstor primary storage.
Fix:
- Add CAN_CREATE_TEMPLATE_FROM_SNAPSHOT capability so the
orchestrator skips the backup-to-secondary path
- Add canCopySnapshotToVolumeCond() to match SNAPSHOT -> VOLUME
when both are on the same Linstor primary store
- Wire it into canCopy() to intercept at DataMotionServiceImpl
before strategy selection, bypassing StorageSystemDataMotionStrategy
- Implement copySnapshotToVolume() which delegates to the existing
createResourceFromSnapshot() for native Linstor snapshot restore
This follows the same pattern used by the StorPool plugin, which
handles SNAPSHOT -> VOLUME directly in its driver rather than going
through StorageSystemDataMotionStrategy.
Tested on CloudStack 4.22 with Linstor LVM_THIN storage, creating
a volume from a 1TB CNPG Postgres database snapshot. Volume creates
successfully with correct path and deletes cleanly.
* Let CloudRuntimeException propagate from copySnapshotToVolume
Remove try/catch in copySnapshotToVolume so that CloudRuntimeException
from createResourceFromSnapshot propagates to the caller, ensuring
CloudStack properly notices and reports the failure.
* Fix CAN_CREATE_TEMPLATE_FROM_SNAPSHOT breaking template creation
Setting CAN_CREATE_TEMPLATE_FROM_SNAPSHOT unconditionally to true
caused createTemplate from snapshot to take the StorPool-specific
code path in TemplateManagerImpl, which sends a CopyCommand to a
system VM that Linstor cannot handle.
Fix: make CAN_CREATE_TEMPLATE_FROM_SNAPSHOT conditional on the same
flag as STORAGE_SYSTEM_SNAPSHOT (!BackupSnapshots). When snapshots
are backed up to secondary (the default), the old template creation
flow works. When snapshots stay on primary, the direct path is used.
Also fix checkstyle: remove unused DataObject import in test.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LinstorDataMotionStrategy.copyAsync would call createResource on the
destination pool's controller without first verifying that the
destination KVM host is registered as a LINSTOR satellite there. Two
failure modes:
1. The resource group's auto-placement filter happens to match a
different node (a registered satellite that is NOT the migration
destination), and the resource is silently created on the wrong
node. The subsequent migrate then fails because the destination
KVM host has no DRBD device for the resource.
2. The auto-placement filter has no candidates and the LINSTOR API
returns an opaque error. The operator has to correlate the
migration failure with an unrelated controller log entry to
understand what happened.
This change adds verifyDestinationIsLinstorSatellite() called at the
top of copyAsync. For each LINSTOR-typed destination pool it:
- fetches the controller's node list via LinstorUtil.getLinstorNodeNames
- throws CloudRuntimeException with a clear actionable message
(lists known satellites) if destHost.getName() is missing from
that list
- silently skips on transient controller errors so a network blip
against the controller doesn't block an otherwise valid migration
Non-LINSTOR destination pools in the volumeDataStoreMap are skipped
(mixed-storage migrations are unaffected).
* Move logs for values of the migration settings out of the loop
* Apply suggestions from code review
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
---------
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
Fixes an issue in NsxResource.executeRequest where Network.Service
comparison failed when DeleteNsxNatRuleCommand was executed in a
different process. Due to serialization/deserialization, the
deserialized Network.Service instance was not equal to the static
instances Network.Service.StaticNat and Network.Service.PortForwarding,
causing the comparison to always return false.
Co-authored-by: Andrey Volchkov <avolchkov@playtika.com>
(cherry picked from commit 30dd234b00)
* initial attempt at network.loadbalancer.haproxy.idle.timeout implementation
* implement test cases
* move idleTimeout configuration test to its own test case
`cursor` field when more pages are available. The previous implementation only
fetched the first page and ignored pagination.
This change updates the list retrieval flow to:
- follow the `cursor` chain until no further pages exist
- accumulate items from all pages
- return a single merged result to the caller
This ensures that list operations return the complete dataset rather than just
the first page.
Co-authored-by: Andrey Volchkov <avolchkov@playtika.com>
* kvm: fix wrong CheckVirtualMachineAnswer when vm does not exist
* kvm: add LibvirtCheckVirtualMachineCommandWrapperTest
Co-authored-by: dahn <daan.hoogland@gmail.com>
* Fix domain parsing for GPU
* Add Display controller to GPU class check
this adds support for the amd instinct mi2xx accelorator crards in the discovery script.
Co-authored-by: Piet Braat <piet@phiea.nl>
Fixes an issue in NsxResource.executeRequest where Network.Service
comparison failed when DeleteNsxNatRuleCommand was executed in a
different process. Due to serialization/deserialization, the
deserialized Network.Service instance was not equal to the static
instances Network.Service.StaticNat and Network.Service.PortForwarding,
causing the comparison to always return false.
Co-authored-by: Andrey Volchkov <avolchkov@playtika.com>
* Fix NPE in NASBackupProvider when no running KVM host is available
ResourceManager.findOneRandomRunningHostByHypervisor() can return null
when no KVM host in the zone has status=Up (e.g. during management
server startup, brief agent disconnections, or host state transitions).
NASBackupProvider.syncBackupStorageStats() and deleteBackup() call
host.getId() without a null check, causing a NullPointerException that
crashes the entire BackupSyncTask background job every sync interval.
This adds null checks in both methods:
- syncBackupStorageStats: log a warning and return early
- deleteBackup: throw CloudRuntimeException with a descriptive message