Apply the review comments from the first round on #13061:
* FlashArrayAdapter.snapshot() and both getSnapshot() entry points now
wrap the returned FlashArrayVolume in withAddressType(). Without this,
snapshots taken against an NVMe-TCP pool had the constructor-default
AddressType.FIBERWWN and ProviderSnapshot.getAddress() emitted an FC
style WWN instead of the NVMe EUI-128, which the adaptive driver then
persisted as the snapshot path. Verified end-to-end against Purity 6.7.7:
a fresh NVMe-TCP snapshot now lands with install_path starting 006c... ,
matching the source volume's EUI (previously it was 6-24a9370...).
* FlashArrayAdapter.attach() - retry path after 'Connection already
exists' no longer requires a hostgroup-scoped match for NVMe-TCP. If
hostgroup is not configured, or the existing connection is host-scoped,
fall back to matching by host name, same as the Fibre Channel branch.
Also normalize the 'volume lun is not found' message when no
connection list is returned.
* FlashArrayAdapter.attach() - initial 'Volume attach did not return lun
information' exception message now mentions both lun (FC) and nsid
(NVMe-TCP) so the error is not misleading on NVMe deployments.
* FlashArrayAdapter.getVolumeByAddress() - validate the EUI-128 length
before slicing. A short/malformed address used to throw
StringIndexOutOfBoundsException deep inside getFlashArrayItem and be
swallowed as 'not found'; now a clear RuntimeException is raised with
the expected vs actual length.
* FlashArrayVolume.getAddress() - same defensive check when building an
EUI-128 from the FlashArray volume serial; if the serial is shorter
than 24 hex chars, fail with a clear message instead of SIOOBE.
* MultipathNVMeOFAdapterBase.connectPhysicalDisk() - Integer.parseInt of
the STORAGE_POOL_DISK_WAIT detail is now guarded; a non-numeric value
falls back to the default rather than aborting the connect.
* MultipathNVMeOFAdapterBase.rescanAllControllers() - honour the boolean
return from Process.waitFor(). If an nvme ns-rescan invocation does
not complete in NS_RESCAN_TIMEOUT_SECS we destroyForcibly() it, so
hung nvme-cli processes do not accumulate while the namespace poll
loop retries.
* NVMeTCPAdapter - rename LOGGER_NVMETCP to LOGGER to match the naming
convention used in the other KVM adapters.
Signed-off-by: Eugenio Grosso <eugenio.grosso@gmail.com>
The NVMe-oF KVM adapter refused every template copy request from the
adaptive storage orchestrator with UnsupportedOperationException, which
made it impossible to use an NVMe-TCP pool as primary storage for a VM
root disk: every deploy that landed a root volume on the pool failed
as soon as CloudStack tried to lay down the template.
Implement it the same way FiberChannel (SCSI) does: the storage provider
creates and connects a raw namespace ahead of time, then the adapter
resolves the host-side /dev/disk/by-id/nvme-eui.<NGUID> path via the
existing getPhysicalDisk plumbing (which will nvme ns-rescan and wait
for the symlink if the kernel has not yet picked it up) and qemu-img
converts the source image into the raw block device.
User-space encrypted source or destination volumes are rejected: the
FlashArray already encrypts at rest and layering qemu-img LUKS on top
of a hostgroup-scoped namespace shared between hosts is not a sensible
layering. Source encryption would also break on migration because the
passphrase does not travel.
With this change a CloudStack KVM VM can have its ROOT volume on an
NVMe-TCP pool (tested end-to-end on 4.23-SNAPSHOT against Purity 6.7.7:
template copy, first boot, live migrate with data disk, VM snapshot
with quiesce, and revert all work).
Signed-off-by: Eugenio Grosso <eugenio.grosso@gmail.com>
The adaptive storage framework hard-coded FiberChannel as the KVM-side
pool type for every provider it fronts. With a separate NVMeTCP pool
type now available (and a dedicated NVMe-oF adapter on the KVM side),
teach the lifecycle to route a pool to the right adapter based on a
transport= URL parameter:
https://user:pass@host/api?...&transport=nvme-tcp
-> StoragePoolType.NVMeTCP -> NVMeTCPAdapter on the KVM host
When the query parameter is absent the default stays FiberChannel, so
existing FC deployments on Primera or FlashArray continue to work
unchanged.
The choice is made in the shared AdaptiveDataStoreLifeCycleImpl rather
than inside each vendor plugin so every adaptive provider (FlashArray,
Primera, any future one) speaks the same configuration vocabulary.
Introduce an NVMe-over-Fabrics counterpart to the existing
MultipathSCSIAdapterBase / FiberChannelAdapter pair.
NVMe-oF is conceptually distinct from SCSI - it speaks the NVMe command
set, identifies namespaces by EUI-128 NGUIDs, and is multipathed by the
kernel natively rather than by device-mapper - so keeping it out of the
SCSI code path avoids special-casing inside every method that handles
volume paths, connect, disconnect, or size lookup.
MultipathNVMeOFAdapterBase (abstract)
* Parses volume paths of the form
type=NVMETCP; address=<eui>; connid.<host>=<nsid>; ...
into an AddressInfo whose path is
/dev/disk/by-id/nvme-eui.<eui>
which is the udev symlink the kernel emits for every NVMe namespace.
* connectPhysicalDisk polls the udev path and, on every iteration,
triggers nvme ns-rescan on all local NVMe controllers, to cover
target/firmware combinations that do not send an asynchronous event
notification when a new namespace is mapped.
* disconnectPhysicalDisk is a no-op; the kernel drops the namespace
when the target removes the host-group connection. The
ByPath variant only claims paths starting with
/dev/disk/by-id/nvme-eui. so foreign paths still fall through to
other adapters.
* Delegates getPhysicalDisk, isConnected, and getPhysicalDiskSize to
plain test -b / blockdev --getsize64 calls - no SCSI rescan, no dm
multipath, no multipath-map cleanup timer.
* createPhysicalDisk / createTemplateFromDisk / listPhysicalDisks /
copyPhysicalDisk all throw UnsupportedOperationException - these
are the responsibility of the storage provider, not the KVM
adapter, same as the SCSI base.
MultipathNVMeOFPool
* KVMStoragePool mirror of MultipathSCSIPool. Defaults to
Storage.StoragePoolType.NVMeTCP in the parameterless-fallback
constructor.
NVMeTCPAdapter
* Concrete adapter that registers itself for
Storage.StoragePoolType.NVMeTCP via the reflection-based scan in
KVMStoragePoolManager. Carries no logic of its own beyond binding
the base to the pool type.
A similar MultipathNVMeOFAdapterBase-derived NVMeRoCEAdapter (or
NVMeFCAdapter) can later be added by adding one concrete subclass and a
new pool-type value; the base does not assume any particular
fabric-level transport.
NVMe-oF over TCP (NVMe-TCP) is conceptually a separate storage fabric
from Fibre Channel / iSCSI: it speaks the NVMe command set rather than
SCSI, identifies namespaces by EUI-128 NGUIDs rather than WWNs, and on
Linux is multipathed natively by the nvme driver rather than by
device-mapper multipath. Giving it its own StoragePoolType lets the
KVM agent dispatch the adaptive driver to a dedicated NVMe-oF adapter
(added in the next commit) without polluting the existing Fibre Channel
code path.
The new value is wired into the same format-routing and derivePath
fall-through paths that already special-case FiberChannel in
KVMStorageProcessor: NVMe-TCP volumes are also RAW and carry their
device path in DataObjectTO.path rather than in a managedStoreTarget
detail.
Teach FlashArrayAdapter to talk to a pool over NVMe over TCP instead of
Fibre Channel.
The transport is selected from a new transport= option on the storage
pool URL (or the equivalent storage_pool_details entry), e.g.
https://user:pass@fa:443/api?pod=cs&transport=nvme-tcp&hostgroup=cluster1
Defaults remain Fibre Channel / WWN addressing when transport is absent
or anything other than nvme-tcp, so existing FC pools are unaffected.
Beyond the transport parsing itself the adapter now:
* Tracks a per-pool volumeAddressType (AddressType.NVMETCP or
FIBERWWN) and stamps every volume it hands back to the framework
with it (withAddressType), so the adaptive driver path stores the
correct type=... field in the CloudStack volume path (used later
by the KVM driver to locate the device).
* Attaches pod-backed NVMe-TCP volumes at the host-group level
(POST /connections?host_group_names=...) instead of per-host, so
the array assigns a consistent NSID to every member host; falls
back to per-host attach for FC or when no hostgroup is configured.
* Tolerates a missing nsid in the FlashArray connections response
for NVMe-TCP - Purity does not return one for host-group NVMe
connections; the namespace is identified on the host by EUI-128
from FlashArrayVolume.getAddress(), so a placeholder value is
returned to the caller purely for informational tracking.
* Resolves NVMETCP addresses back to volumes in getVolumeByAddress
by reversing the EUI-128 layout (strip optional eui. prefix, drop
leading 00 and the embedded Pure OUI).
* Indexes NVMe connections in getConnectionIdMap by host name (the
array returns one entry per host inside a host-group connection),
so connid.<hostname> tokens in the path still match in
parseAndValidatePath on the KVM side.
Followed by a matching adaptive/KVM driver change (separate commit).
Preparatory data-model changes for NVMe-TCP support on the adaptive
storage framework. No behaviour change for existing Fibre Channel
users - the extra enum value, field, and getter/setter are only
exercised by callers that explicitly use them.
ProviderVolume.AddressType gains a NVMETCP value alongside FIBERWWN,
so adapters can declare that a volume is addressed by an NVMe EUI-128
(NGUID) rather than a SCSI WWN.
FlashArrayVolume.getAddress() produces the NGUID layout expected by
the Linux kernel for a FlashArray NVMe namespace:
00 + serial[0:14] + 24a937 (Pure 6-hex OUI) + serial[14:24]
which matches the /dev/disk/by-id/nvme-eui.<id> symlink emitted by
udev. Fibre Channel callers (addressType != NVMETCP) still get the
existing 6 + 24a9370 + serial form.
FlashArrayConnection gains a nsid field to carry the namespace id the
FlashArray REST API attaches to host-group-scoped NVMe connections,
when it is present.
* initial attempt at network.loadbalancer.haproxy.idle.timeout implementation
* implement test cases
* move idleTimeout configuration test to its own test case
`cursor` field when more pages are available. The previous implementation only
fetched the first page and ignored pagination.
This change updates the list retrieval flow to:
- follow the `cursor` chain until no further pages exist
- accumulate items from all pages
- return a single merged result to the caller
This ensures that list operations return the complete dataset rather than just
the first page.
Co-authored-by: Andrey Volchkov <avolchkov@playtika.com>
* kvm: fix wrong CheckVirtualMachineAnswer when vm does not exist
* kvm: add LibvirtCheckVirtualMachineCommandWrapperTest
Co-authored-by: dahn <daan.hoogland@gmail.com>
* Fix domain parsing for GPU
* Add Display controller to GPU class check
this adds support for the amd instinct mi2xx accelorator crards in the discovery script.
Co-authored-by: Piet Braat <piet@phiea.nl>
Fixes an issue in NsxResource.executeRequest where Network.Service
comparison failed when DeleteNsxNatRuleCommand was executed in a
different process. Due to serialization/deserialization, the
deserialized Network.Service instance was not equal to the static
instances Network.Service.StaticNat and Network.Service.PortForwarding,
causing the comparison to always return false.
Co-authored-by: Andrey Volchkov <avolchkov@playtika.com>
* Fix NPE in NASBackupProvider when no running KVM host is available
ResourceManager.findOneRandomRunningHostByHypervisor() can return null
when no KVM host in the zone has status=Up (e.g. during management
server startup, brief agent disconnections, or host state transitions).
NASBackupProvider.syncBackupStorageStats() and deleteBackup() call
host.getId() without a null check, causing a NullPointerException that
crashes the entire BackupSyncTask background job every sync interval.
This adds null checks in both methods:
- syncBackupStorageStats: log a warning and return early
- deleteBackup: throw CloudRuntimeException with a descriptive message