* NSX integration - skeletal code
* Fix module not loading on startup
* add upgrade path and daos
\n add nsx controller command
* add support for adding and listing nsx provider to a zone
* add license
* add default VPC offering and update upgrade path
* add global setting to enable nsx plugin
* add delete nsx controller operation
* add nsxresource
* add NSX resource , api client, create tier1 gw
* update db
* update response and add license
* Add support to create and delete nsx tier-1 gateway
* add license
* cleanup and add skeletal code for network creation
* add create/delete segment and UI integration
* add license
* address code smells - part 1
* fix test / build failure
* NSX integration - skeletal code
* Fix module not loading on startup
* add upgrade path and daos
\n add nsx controller command
* add support for adding and listing nsx provider to a zone
* add license
* add default VPC offering and update upgrade path
* add global setting to enable nsx plugin
* add delete nsx controller operation
* add nsxresource
* add NSX resource , api client, create tier1 gw
* update db
* update response and add license
* Add support to create and delete nsx tier-1 gateway
* add license
* cleanup and add skeletal code for network creation
* add create/delete segment and UI integration
* add license
* address code smells - part 1
* fix test / build failure
* add ui changes + update nsx_provider table transport zones + use NSX broadcast domain for add nics to router
* ui: fix password field, and backend changes
* add route advertisement
* update offering
* update offering
* add sleep before deletion of vpc / tier g/w for ports to be removed
* move creation of segments to design phase
* change provider to VPC router for Dhcp & dns service in an nsx offering
* Add public nic for NSX
* reserve first IP (after g/w) of subnet for router nic - NSX
* revert reserving 1st IP in vpc segments
* [NSX] Create a DHCP relay and add it to a VPC tier segment (#107)
* Create DHCP relay command and execute request
* In progress integrate with networking
* Create DHCP relay config on the network VR allocation
* Revert domain router dao changes
* Create DHCP relay con VR nic plug to NSX network
* Link DHCP relay config to segment after creation
* [NSX] Cleanup DHCP Relay config on segment deletion (#108)
* Cleanup DHCP Relay config on segment deletion
* update segment & relay name generators and call delete dhcprelay after deletion of segment
* address comment
* [NSX] Fix DHCP relay config deletion was missing zone name (#8068)
* [NSX] Refactor API wrapper operations (#8059)
* [NSX] Refactor API wrapper operations
* Big refactor
* Address review comment
* change network cidr to cidr to prevent NPE
* add domain and zone names to the various networks - vpc & tier
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
* Nsx unit tests (#8090)
* Add tests
* add test for NsxGuestNetworkGuru
* add unit tests for NsxResource
* add unti tests for NsxElement
* cleanup
* [NSX] Refactor API wrapper operations
* update tests
* update tests - add nsxProviderServiceImpl test
* add unit test - NsxServiceImpl
* add license
* Big refactor
* Address review comment
* change network cidr to cidr to prevent NPE
* add domain and zone names to the various networks - vpc & tier
* fix tests
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* modify NSX resource naming convention (#8095)
* modify NSX resource naming convention
* remove unused imports
* add a setup phase between desgin and implementation of a network for intermediary steps
* add method to all classes
* NSX: Refactor Network & VPC offering (#8110)
* [NSX] Refactor API wrapper operations
* Network offering changes for NSX
* fix services and provider combination
* address comments: rename param
* update nsx_mode parameter
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* fix test
* [NSX] Allow NSX isolated networks (#8132)
* Add network offerings for NSX on isolated networks
* Fix offerings creation
* In progress NSX isolated network
* Fixes
* Fix NIC allocation to router
* NSX: Add Step for Adding Public traffic network for NSX During zone creation (#8126)
* NSX: Add Step for Adding Public traffic network for NSX
* address comments and cleanup
* address comment
* remove indent
* NSX: Create and Delete static NAT & Port forward rules (#8131)
* NSX: Create and delete NSX Static Nat rules
* fix issues with static nat
* add static nat
* Support to add and delete Port forward rules
* add license
* fix adding multiple pf rules
* cleanup
* fix lint check
* fix smoke tests
* fix smoke tests
* Nsx add lb rule (#8161)
* NSX: Create and delete NSX Static Nat rules
* fix issues with static nat
* add static nat
* Support to add and delete Port forward rules
* add license
* fix adding multiple pf rules
* cleanup
* NSX: Add support to create and delete Load balancer rules
* fix deletion of lb rules
* add header file and update protocol detail
* build failure fix
* [NSX] Add SNAT support (#8100)
* In progress add source NAT
* Fix after merge
* Fix tests
* Fix NPE on isolated network deletion
* Reserve source NAT IP when its not passed for NSX VPC
* Create source NAT rule on VR NIC allocation
* Fix update VPC and remove VPC to update and remove SNAT rule
* Fix packaging
* Address review comment
* Fix build
* fix build - unused import
* Add defensive checks
* Add missing design to NSX public guru
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
* NSX: Fix VR public NIC allocation (#8166)
* NSX: fix LB member addition and deletion and add defensive checks (#8167)
* Fix public NIC NPE on broadcast URI
* NSX: Router Public nic to get IP from systemVM Ip range (#8172)
* NSX: Router Public nic to get IP from systemVM Ip range
* Fix VR IP address and setSourceNatIp command
* NSX: hide systemVM reserved IP range SourceNAT
* fix test
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* fix test failure
* test failure fix
* [NSX] Fix update source NAT IP (#8176)
* [NSX] Fix update source NAT IP
* Fix startup
* Fix API result
* NSX - add LB route Advertizement (#8192)
* [NSX] Add ACL types support (#8224)
* NSX: Create segment group on segment creation
* Add unit tests
* Remove group for segment before removing segment
* Create Distributed Firewall rules
* Remove distributed firewall policy on segment deletion
* Fix policy rule ID and add more unit tests
* Fix DROP action rules and transform tests
* Add new ACL rules
* Fixes
* associate security policies with groups and not to DFW and add deletion of rules
* Fix name convention
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
* NSX: Fix creation of VPCs (#8320)
* Fix ACL rules creation (#8323)
* [NSX] Fix database views (#8325)
* NSX: Add CKS Support & Firewall rules for Isolated Networks (#8189)
* NSX: Add ALL LB IP to the list of route advertisements in tier1
* NSX: Support Source NAT on NSX Isolated networks
* NSX: Cks Support
* NSX: Create segment group on segment creation
* Add unit tests
* Remove group for segment before removing segment
* Create Distributed Firewall rules
* Remove distributed firewall policy on segment deletion
* Fix policy rule ID and add more unit tests
* Add support for routed NSX Isolated networks \n and non RFC 1918 compliant IPs
* Add support for routed NSX Isolated networks \n and non RFC 1918 compliant IPs
* Add Firewall rules
* build failure - fix unit test
* fix npes
* Add support to delete firewall rules
* update nsx cks offering
* add license
* update order of ports in PF & FW rules
* fix filter for getting transport zones
* CKS support changed - MTU updated, etc
* add LB for CKS on VPC
* address comments
* adapt upstream cks logic for vpc
* rever mtu hack
* update UI changes as per upstream fix
* change display test for CKS n/w offerings for isolated and VPC tiers
* add extra line for linter
* address comment
* revert list change
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* fix ui build failure
* [NSX] Address SonarCloud Bugs (#8341)
* [NSX] Address SonarCloud Bugs
* Fix NSX API connection issues
* NSX: Add unit tests to increase coverage (#8355)
* NSX: Add unit tests
* cleanup unused imports
* add more unit tests
* add tests for publicnsxnetworkguru
* add license
* fix build failures
* address sonar comment
* fix security hotspots
* NSX: Add more unit tests (#8381)
* NSX : Unit tests
* remove unused imports
* remove unused import causing build failure
* fix build failures due to unused imports
* fix build failure
* fix test assertion
* remove unused imports
* remove unused import
* Nsx UI zone bug (#8398)
* NSX: Attempt to fix NSX Zone creation bug for public networks
* fix zone wizard public traffic issue
* add proper filtering of offerings based on VPC nsx mode
* clean up console logs
* NSX: Fix code smells and reported bugs (#8409)
* NSX: Fix code smells and reported bugs
* fox override issue
* remove unused imports
* fix test
* refactor code to reduce complexity
* add lisence
* cleanup
* fix build failure
* fix build failure
* address comments
* test - add config to ignore certain files from test coverage
* test exclusion of classes from test cov
* rever pom changes
* [NSX] Add more unit tests (#8431)
* [NSX] Add more unit tests
* More tests
* Fix build errors
* NSX: Prevent creation of L2 and Shared networks for NSX (#8463)
* NSX: Prevent creation of L2 and Shared networks for NSX
* add checks to backend to prevent creation of l2 and shared networks in nsx zones and filter only nsx offerings when creating isolated networks
* cleanup
* NSX: Fix code smells (#8436)
* NSX: Fix code smells
* Add changes to service creation logic
* CKS: Add action to during firewall rule creation (#8498)
* NSX,UI: Deduplicate network list when creating kubernetes clusters (#8513)
* NSX: Make LB service selectable in network offering (#8512)
* NSX: Make LB service selectable in network offering
* fix label
* address comments
* address comments
* NSX: Add appropriate error message when icmp type is set to -1 for NSX (#8504)
* NSX: Add appropriate error message when icmp type is set to -1 for NSX
* address comments
* update text
* fix test
* fix test - build failure
* fix test - build failure
* NSX: Cleanup NSX resources during k8s cluster cleanup (#8528)
* fix test failure
* NSX: Improve segment deletion process (#8538)
* NSX: Add passive monitor for NSX LB to test whether a server is available (#8533)
* NSX: Add passive monitor for NSX LB to test whether a server is available
* Add active monitors too
* fix build failure
* NSX: Add check for ICMP code / type for NSX zones (#8542)
* NSX: Fix Routed Mode for Isolated and VPC networks (#8534)
* NSX: Fix Routed Mode for Isolated and VPC networks
* NSX: Fix Routed mode - add checks for ports added for FW rules
* clean up code
* fix build failure
* NSX: Add retry logic with sleep to delete segments (#8554)
* NSX: Add retry logic with sleep to delete segments
* add logs
* Update pom XML for the NSX project
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
There are a lot of test failures due to test_vm_life_cycle.py in multiple PRs due to host not available for migration of VMs.
#8438 (comment)
#8433 (comment)
#7344 (comment)
While debugging I noticed that the hosts get stuck in Connecting state because MS is waiting for a response of the ReadyCommand from the agent. Since we take a lock on connection and disconnection, restarting the agent doesn't work. To fix this, we have to restart the MS or wait for ~1 hour (default timeout).
On the agent side, it gets stuck waiting for a response from the Script execution.
To reproduce, run smoke/test_vm_life_cycle.py (TestSecuredVmMigration test class to be specific). Once the tests are complete, you will notice that some hosts are stuck in Connecting state. And restarting the agent fails due to the named lock. Locks on DB can be checked using the below query.
SELECT *
FROM performance_schema.metadata_locks
INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
WHERE PROCESSLIST_ID <> CONNECTION_ID() \G;
This PR adds a wait for the ready command and a timeout to the Script execution to ensure that the thread doesn't get stuck and the named lock from database is released.
This PR fixes a regression caused by #8465 on advanced zones, import fails with:
2024-01-10 12:13:33,234 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Allocating nic for vm 142272e8-9e2e-407b-9d7e-e9a03b81653c in network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10} during import
2024-01-10 12:13:33,239 ERROR [o.a.c.v.UnmanagedVMsManagerImpl] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Failed to import NICs while importing vm: i-2-31-VM
com.cloud.exception.InsufficientVirtualNetworkCapacityException: Unable to acquire Guest IP address for network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10}Scope=interface com.cloud.dc.DataCenter; id=1
at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.importNic(NetworkOrchestrator.java:4582)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importNic(UnmanagedVMsManagerImpl.java:859)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importVirtualMachineInternal(UnmanagedVMsManagerImpl.java:1198)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstanceFromHypervisor(UnmanagedVMsManagerImpl.java:1511)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.baseImportInstance(UnmanagedVMsManagerImpl.java:1342)
at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstance(UnmanagedVMsManagerImpl.java:1282)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Also, addresses the VNC password field set instead of a fixed string
This PR makes changes to use cluster's free metrics instead of used while computing imbalance for the cluster. This allows DRS to run for clusters where hosts doesn't have the same amount of metrics.
To prevent errors during multi-user access, use account UUID to create/access user on the provider side. Also, update the existing secret key for a user that already exists.
1. Problem description
In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation.
Equation 1
shares = CPU * speed
Fixes: #6744
2. Proposed changes
To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.
The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS.
Equation 2
shares = (VM requested shares * cgroup upper limit)/host max shares
To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host.
To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency.
It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.
This PR fixes the test failures with CKS HA-cluster upgrade.
In production, the CKS HA cluster should have at least 3 control VMs as well.
The etcd cluster requires 3 members to achieve reliable HA. The etcd daemon in control VMs uses RAFT protocol to determine the roles of nodes. During upgrade of CKS with HA, the etcd become unreliable if there are only 2 control VMs.
This PR provides a new primary storage volume type called "FiberChannel" that allows access to volumes connected to hosts over fiber channel connections. It requires Multipath to provide path discovery and failover. Second, the PR adds an AdaptivePrimaryDatastoreProvider that abstracts how volumes are managed/orchestrated from the connector to communicate with the primary storage provider, using a ProviderAdapter interface, allowing the code interacting with the primary storage provider API's to be simpler and have no direct dependencies on Cloudstack code. Lastly, the PR provides an implementation of the ProviderAdapter classes for the HP Enterprise Primera line of storage solutions and the Pure Flash Array line of storage solutions.
Sometimes the hostStats object of the agents becomes null in the management server. It is a rare situation, and we haven't found the root cause yet, but it occurs occasionally in our CloudStack deployments with many hosts.
The hostStat is null, even though the agent is UP and hosting multiple VMs. It is possible to access the VM consoles and execute tasks on them.
This pull request doesn't address the issue directly; rather it displays those hosts in Prometheus so we can restart the agent and get the necessary information.
This PR adds the capability in CloudStack to convert VMware Instances disk(s) to KVM using virt-v2v and import them as CloudStack instances. It enables CloudStack operators to import VMware instances from vSphere into a KVM cluster managed by CloudStack. vSphere/VMware setup might be managed by CloudStack or be a standalone setup.
CloudStack will let the administrator select a VM from an existing VMware vCenter in the CloudStack environment or external vCenter requesting vCenter IP, Datacenter name and credentials.
The migrated VM will be imported as a KVM instance
The migration is done through virt-v2v: https://access.redhat.com/articles/1351473, https://www.ovirt.org/develop/release-management/features/virt/virt-v2v-integration.html
The migration process timeout can be set by the setting convert.instance.process.timeout
Before attempting the virt-v2v migration, CloudStack will create a clone of the source VM on VMware. The clone VM will be removed after the registration process finishes.
CloudStack will delegate the migration action to a KVM host and the host will attempt to migrate the VM invoking virt-v2v. In case the guest OS is not supported then CloudStack will handle the error operation as a failure
The migration process using virt-v2v may not be a fast process
CloudStack will not perform any check about the guest OS compatibility for the virt-v2v library as indicated on: https://access.redhat.com/articles/1351473.
On no access to the storage nodes, we now create a temporary resource from the snapshot and copy that data into the secondary storage. Revert works the same, just that we now also look additionally for any Linstor agent node.
Also enables now backup snapshot by default.
This whole BackupSnapshot functionality was introduced in 4.19,
so I would be happy if this still could be merged.
Co-authored-by: Stephan Krug <stephan.krug@scclouds.com.br>
Co-authored-by: GaOrtiga <49285692+GaOrtiga@users.noreply.github.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
This PR adds an config option for the Linstor primary storage driver, that allows you to automatically backup
volume snapshots to the secondary storage.
Additionally it will not mangle the need java-linstor dependency into the client.jar, but instead just copy
the java-linstor.jar into lib.
Config option is called: lin.backup.snapshots and is default false
The scope of this change should be limited, as it only touches the Linstor driver and a part of copyAsync
was implemented with 2 new Linstor specific commands.