Commit Graph

275 Commits

Author SHA1 Message Date
nvazquez 67a51beefe
Merge branch 'main' into cks-enhancements-upstream 2025-02-03 08:45:53 -03:00
Daan Hoogland 2654890e86 Merge branch '4.20' 2025-02-01 21:20:08 +01:00
Abhishek Kumar 0b5a5e8043
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements

Following changes and improvements have been added:

- Improvements in handling of PingRoutingCommand

    1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
    2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
    3. Optimized scanning stalled VMs

- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`

- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine

- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.

- Added caching for account/use role API access with expiration after write set to 60 seconds.

- Added caching for some recurring DB retrievals

    1. CapacityManager - listing service offerings - beneficial in host capacity calculation
    2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
    3. DownloadListener - hypervisors for zone - beneficial for host joins
    5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands

- Optimized MS list retrieval for agent connect

- Optimize finding ready systemvm template for zone

- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks

- Changes in agent-agentmanager connection with NIO client-server classes

    1. Optimized the use of the executor service
    2. Refactore Agent class to better handle connections.
    3. Do SSL handshakes within worker threads
    5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
    6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.

- Improvements in StatsCollection - minimize DB retrievals.

- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.

- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.

- Minor improvements in resource limit calculations wrt DB retrievals

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* test1, domaindetails, capacitymanager fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* test2 - agent tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* capacitymanagertest fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* change

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix missing changes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert marvin/setup.py

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix indent

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* use space in sql

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address duplicate

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* update host logs

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert e36c6a5d07

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix npe in capacity calculation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* move schema changes to 4.20.1 upgrade

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix build

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add some more tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* checkstyle fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* remove unnecessary mocks

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* replace statics

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* engine/orchestration,utils: limit number of concurrent new agent
connections

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactor - remove unused

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unregister closed connections, monitor & cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add check for outdated vm filter in power sync

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* agent: synchronize sendRequest wait

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-01 12:28:41 +05:30
Abhishek Kumar 7abda3b963 Merge remote-tracking branch 'apache/4.20' 2025-01-31 16:55:38 +05:30
Abhishek Kumar ae2ffbe40b Merge remote-tracking branch 'apache/4.19' into 4.20 2025-01-31 16:54:22 +05:30
Wei Zhou b9890875cc
CKS: use --delete-emptydir-data instead of deprecated --delete-local-data (#10234) 2025-01-30 15:49:26 +01:00
Daan Hoogland 98f5663954 Merge branch '4.20' 2025-01-24 17:10:43 +01:00
Daan Hoogland 34d2a3bc86 Merge branch '4.19' into 4.20 2025-01-24 17:01:42 +01:00
Abhishek Kumar 4787885fc0
cks: prevent npe on cluster listing with removed offering (#10075)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2025-01-24 09:47:52 +01:00
nvazquez e8a96d4c7c
Fix marvin test for adding and removing external nodes 2025-01-14 00:47:13 -03:00
nvazquez 90eb42a4b9
Add marvin test and fix update template for cks and since annotations 2025-01-13 00:21:50 -03:00
nvazquez 4b6e0ecf2b
Merge branch 'main' into cks-enhancements-upstream 2025-01-08 12:07:49 -03:00
nvazquez 36a02b81d0
Address review comments: plan CKS cluster deployment based on the node type 2025-01-08 11:57:30 -03:00
Daan Hoogland fadb39ece7 Merge release branch 4.20 to main
* 4.20:
  merge errors fixed
  Restrict the migration of volumes attached to VMs in Starting state (#9725)
  server, plugin: enhance storage stats for IOPS (#10034)
  Introducing granular command timeouts global setting (#9659)
  Improve logging to include more identifiable information (#9873)
2025-01-08 14:01:19 +01:00
nvazquez 89effc7eda
Add calico commands for ens35 interface 2025-01-07 12:00:11 -03:00
dahn d1cf45365a
Revert "pre-commit: add hook `check-yaml` (#9133)" (#10161)
This reverts commit 57867dc6b0.
2025-01-06 12:21:26 +01:00
Vishesh a4224e58cc
Improve logging to include more identifiable information (#9873)
* Improve logging to include more identifiable information for kvm plugin

* Update logging for scaleio plugin

* Improve logging to include more identifiable information for default volume storage plugin

* Improve logging to include more identifiable information for agent managers

* Improve logging to include more identifiable information for Listeners

* Replace ids with objects or uuids


* Improve logging to include more identifiable information for engine

* Improve logging to include more identifiable information for server

* Fixups in engine

* Improve logging to include more identifiable information for plugins

* Improve logging to include more identifiable information for Cmd classes

* Fix toString method for StorageFilterTO.java
2025-01-06 16:42:37 +05:30
nvazquez ce50ad21ac
Revert PR 9133 2025-01-04 00:50:48 -03:00
nvazquez 663dce1f18
Fix build 2025-01-02 12:27:46 -03:00
Pearl Dsilva 313facb8b5
Fix missing quotes 2025-01-02 11:34:16 -03:00
Pearl Dsilva e7884b3739
Set token TTL to 0 (no expire) for external etcd 2025-01-02 11:33:57 -03:00
Nicolas Vazquez 43d2131901
Fix CKS cluster scale calculations 2025-01-02 11:31:58 -03:00
Pearl Dsilva a4a5541aae
Consider cordoned nodes while getting ready nodes 2025-01-02 11:31:39 -03:00
Pearl Dsilva ab271839a2
Tweak setup-kube-system script for baremetal external nodes 2025-01-02 11:31:06 -03:00
Pearl Dsilva fe3efdbe57
CKS: Add external node to a CKS cluster deployed with etcd node(s) successfully
* CKS: Add external node to a CKS cluster deployed with etcd node(s) successfully

* simplify logic
2025-01-02 11:30:24 -03:00
Pearl Dsilva 9905f07b6e
Improve etcd indexing - start from 1 2025-01-02 11:30:05 -03:00
Nicolas Vazquez c884423b96
Fix: Use control node guest IP as join IP for external nodes addition 2025-01-02 11:28:48 -03:00
Pearl Dsilva 27d08e2f3f
CKS cluster with guest IP 2025-01-02 11:27:01 -03:00
Pearl Dsilva 749a71d957
CKS: Use Network offering of the network passed during CKS cluster creation to get the AS number 2025-01-02 11:24:57 -03:00
Pearl Dsilva e5d377a1ad
CKS: Add CNI configuration details to the response and UI 2025-01-02 10:57:20 -03:00
Pearl Dsilva 1dadd257a5
Fix deletion of Userdata / CNI Configuration in projects 2025-01-02 10:54:18 -03:00
Pearl Dsilva 51d4403adf
CKS: Do not pass AS number when network ID is passed 2025-01-02 10:47:48 -03:00
Pearl Dsilva 2a1de76517
Add support for using different CNI plugins with CKS
* Add support for using different CNI plugins with CKS

* remove unused import

* Add UI support and list cni config API

* necessary UI changes

* add license

* changes to support external cni

* UI changes

* Fix NPE on restarting VPC with additional public IPs

* fix merge conflict

* add asnumber to create k8s svc layer

* support cni framework to use as-numbers

* update code

* condition to ignore undefined jinja template variables
2025-01-02 10:42:29 -03:00
nvazquez b0eb65f31a
Merge branch 'main' into cks-enhancements-upstream 2025-01-02 08:27:10 -03:00
John Bampton 57867dc6b0
pre-commit: add hook `check-yaml` (#9133)
https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#check-yaml
2024-12-31 09:32:32 +01:00
Daan Hoogland 9b6f9b5f7d Merge release branch 4.20 to main
* 4.20:
  UI: Tooltip on the host information card to display the CPU speed in MHz and the memory value in MB (to 3 decimal places) (#9971)
  UI: Allow accounts of the `User` type to add other accounts or users to projects through UI (#9927)
  enable to create VPC portfowarding rules with source cidr (#7081)
  Add new column `last_id` to the table volumes (#9759)
  Allow VMWare import via another host (#9787)
  Linstor: add support for ISO block devices and direct download (#9792)
  get expunged VM data for job result (#9949)
  fix section divider display on auth page (#9966)
2024-12-03 16:33:51 +01:00
Daan Hoogland da54234585 Merge branch '4.19' into 4.20.merge 2024-12-03 16:32:15 +01:00
Rodrigo D. Lopez 4189bac8e0
enable to create VPC portfowarding rules with source cidr (#7081)
Co-authored-by: Lopez <rodrigo@scclouds.com.br>
Co-authored-by: Fabricio Duarte <fabricio.duarte.jr@gmail.com>
2024-11-28 17:53:07 +01:00
João Jandre d9774a8462 Updating pom.xml version numbers for release 4.21.0.0-SNAPSHOT
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-11-27 11:47:06 -03:00
João Jandre c63c7ee63e Updating pom.xml version numbers for release 4.20.1.0-SNAPSHOT
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-11-27 11:40:45 -03:00
João Jandre 2fe3fcef7c Updating pom.xml version numbers for release 4.20.0.0
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-11-19 08:54:07 -03:00
nvazquez 87a528d2b2
Merge branch 'main' into cks-enhancements-upstream 2024-10-03 00:00:10 -03:00
Wei Zhou 04c428cb2f
CKS: add ConfigDrive to cloud-init datasource_list in systemvm template (#7650)
* CKS: add ConfigDrive to cloud-init datasource_list in systemvm template

* systemvm template: update debian 11.7.0 iso url

* CKS: get K8S iso by LABEL=CDROM if config drive ISO is attached

* Revert "CKS: add ConfigDrive to cloud-init datasource_list in systemvm template"

This reverts commit b6863a5ce1.

* CKS: patch cloud-init in opt/cloud/bin/setup/cksnode.sh

* PR7650: move ConfigDrive before CloudStack in datasource list

* Revert "CKS: patch cloud-init in opt/cloud/bin/setup/cksnode.sh"

This reverts commit 75be03c6aa.

* CKS: fix ConfigDrive
2024-09-27 09:09:27 -03:00
João Jandre bf2cedea79 Merge branch 4.19 to main 2024-09-26 09:01:22 -03:00
João Jandre cb4713fabe Merge release branch 4.18 to 4.19
* 4.18:
  CKS: fix creation on shared network if HA is enabled (#8588)
2024-09-26 08:50:02 -03:00
Wei Zhou bb820f799b
CKS: fix creation on shared network if HA is enabled (#8588) 2024-09-26 08:37:25 -03:00
Vishesh 21d107c349
Merge branch '4.19' 2024-09-24 14:04:51 +05:30
Thomas O'Dowd 638c1526d0
Fix the Cloudian Integration SSO Redirect link (#9656) 2024-09-10 14:52:59 +02:00
Pearl Dsilva f8d8a9c7b3
NSX Integration fixes (#8906)
* Prevent addition of duplicate PF rules on scale up and no rules left behind on scale down (#32)

* fix missing dependency injection

* NSX: Fix concurrency issues on port forwarding rules deletion (#37)

* Fix concurrency issues on port forwarding rules deletion

* Refactor objectExists

* Fix unit test

* Fix test

* Small fixes

* CKS: Externalize control and worker node setup wait time and installation attempts (#38)

* NSX: Add shared network support (#41)

* NSX: Fix number of physical networks for Guest traffic checks and leftover rules on CKS cluster deletion (#45)

* Fix pf rules removal on CKS cluster deletion

* Fix check for number of physical networks for guest traffic

* Fix unit test

* fix logger

* NSX: Handle CheckHealthCommand to avoid host disconnection and errors on APIs

* NSX: Handle CheckHealthCommand to avoid host disconnection and errors on APIs

* Remove unused string

* fix logger

* Update UDP active monitor to ICMP

* Fix NPE on restarting VPC with additional public IPs

* NSX / VPC: Reuse Source NAT IP from systemVM range on restarts

* CKS: Public IP not found for VPC networks

* Externalize retries and inverval for NSX segment deletion (#67)

* remove unused import

* remove duplicate imports

* remove unused import

* revert externalizing cks settings

* fix test

* Refactor log messages

* Address comments

* Fix issue caused due to forward merge: 90fe1d

---------

Co-authored-by: Nicolas Vazquez <nicovazquez90@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2024-09-06 16:56:50 -03:00
Nicolas Vazquez 8c8d115a1e
feature: Support Multi-arch Zones (#9619)
This introduces the multi-arch zones, allowing users to select the VM arch upon deployment. 

Multi-arch zone support in CloudStack can allow admins to mix x86_64 & arm64 hosts within the same zone with the following changes proposed:
- All hosts in a clusters need to be homogenous, wrt host CPU type (amd64 vs arm64) and hypevisor
- Arch-aware templates & ISOs:
   -  Add support for a new arch field (default set of: amd64 and arm64), when unspecified defaults to amd64 and for existing templates & iso
   -  Allow admins to edit the arch type of the registered template & iso
- Arch-aware clusters and host:
   - Add new attribute field for cluster and hosts (kvm host agents can automatically report this, arch of the first host of the cluster is cluster's architecture), defaults to amd64 when not specified
   - Allow admins to edit the arch of an existing cluster
- VM deployment form (UI):
   - In a multi-arch zone/env, the VM deployment form can allow some kind of template/iso filtration in the UI
   - Users should be able to select arch: amd64 & arm64; but this is shown only in a multi-arch zone (env)
- VM orchestration and lifecycle operations:
   - Use of VM/template's arch to correctly decide where to provision the VM (on the correct strictly arch-matching host/clusters) & other lifecycle operations (such as migration from/to arch-matching hosts)

Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2024-09-06 12:14:54 +05:30