Commit Graph

12 Commits

Author SHA1 Message Date
Suresh Kumar Anaparti 9dceae4614
MS maintenance improvements (#10417)
* Update last agents during ms maintenance, and some code improvements

* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]

* Migrate systemvm agents before routing host agents, and some code improvements

* Added events for ms maintenance and shutdown operations

* Added the following ms maintenance and shutdown improvements

- block new agent connections during prepare for maintenance of ms

- maintain avoids ms list

- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents

- updated setup ms list and migrate agent connections to executor service

- migrate agent connection through executor, and send the answer to the ms host that initiated the migration

- re-initialize ssl handshake executor if it is shutdown

- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states

- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states

- stop agent connections monitor on ms maintenance

- update avoid ms list in ready command

- updated connected host from the client connection

- update last agents in ms metrics from the database

- updated some agent config descriptions

- update last management server in the hosts during shutdown

- added agents and lastagents in management server response

- updated management server maintenance & shutdown unit tests

- some code improvements

* refactored code / addressed comments

* removed shutdown testcase (maybe, calling System.exit)

* Revert "removed shutdown testcase (maybe, calling System.exit)"

This reverts commit e14b071715.

* avoid system.exit during shutdown test

* code improvements

* testcase fix

* Fix cutoff time in agent connections monitor thread
2025-03-19 14:18:05 +05:30
Abhishek Kumar 0b5a5e8043
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements

Following changes and improvements have been added:

- Improvements in handling of PingRoutingCommand

    1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
    2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
    3. Optimized scanning stalled VMs

- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`

- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine

- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.

- Added caching for account/use role API access with expiration after write set to 60 seconds.

- Added caching for some recurring DB retrievals

    1. CapacityManager - listing service offerings - beneficial in host capacity calculation
    2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
    3. DownloadListener - hypervisors for zone - beneficial for host joins
    5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands

- Optimized MS list retrieval for agent connect

- Optimize finding ready systemvm template for zone

- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks

- Changes in agent-agentmanager connection with NIO client-server classes

    1. Optimized the use of the executor service
    2. Refactore Agent class to better handle connections.
    3. Do SSL handshakes within worker threads
    5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
    6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.

- Improvements in StatsCollection - minimize DB retrievals.

- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.

- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.

- Minor improvements in resource limit calculations wrt DB retrievals

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* test1, domaindetails, capacitymanager fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* test2 - agent tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* capacitymanagertest fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* change

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix missing changes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert marvin/setup.py

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix indent

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* use space in sql

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address duplicate

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* update host logs

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert e36c6a5d07

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix npe in capacity calculation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* move schema changes to 4.20.1 upgrade

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix build

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add some more tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* checkstyle fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* remove unnecessary mocks

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* replace statics

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* engine/orchestration,utils: limit number of concurrent new agent
connections

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactor - remove unused

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unregister closed connections, monitor & cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add check for outdated vm filter in power sync

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* agent: synchronize sendRequest wait

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-01 12:28:41 +05:30
Charles Weng 64593574d8
test update and get connected hosts (#9136) 2024-06-12 15:59:01 +02:00
Vishesh e26d49de4d
agent: remove powermock from tests (#7637) 2023-06-20 08:47:05 +02:00
Paula Oliveira 0fe2e6950e
Improving code related to the Agent properties (#6348)
Co-authored-by: Paula Zomignani Oliveira <paula@scclouds.com.br>
Co-authored-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2022-12-22 12:00:49 +01:00
Daniel Augusto Veronezi Salvador 7b752c3077
Externalize KVM Agent storage's timeout configuration (#5239)
* Externalize KVM Agent storage's timeout configuration

* Address @nvazquez review

* Add empty line at the end of the agent.properties file

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-28 15:45:27 +02:00
dahn 6f93e5cd08
Revert "Externalize kvm agent storage timeout configuration (#4585)" (#5218)
This reverts commit 05a978c249.
2021-07-20 09:16:43 +02:00
Daniel Augusto Veronezi Salvador 05a978c249
Externalize kvm agent storage timeout configuration (#4585)
* Externalize KVM Agent storage's timeout configuration

Created a class of constant agent's properties available to configure on "agent.properties".
Created a class to provides a facility to read the agent's properties file and get its properties.

* Refactored KVHAMonitor nested thread and changed some logs

* It has been added the timeout's config in the agent.properties file

* Rename classes

* Rename var and remove comment

* Fix typo with word "heartbeat"

* Extract multiple methods call to variables

* Add unit tests to file handler

* Increase info about the property

* Create inner class Property

* Rename method getProperty to getPropertyValue

* Remove copyright

* Remove copyright

* Extract code to createHeartBeatCommand

* Change method access from protected to private

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-19 09:41:50 +02:00
Nicolas Vazquez 73122fd0a9
[KVM] Direct download agnostic of the storage provider (#3828)
* Remove constraint for NFS storage

* Add new property on agent.properties

* Add free disk space on the host prior template download

* Add unit tests for the free space check

* Fix free space check - retrieve avaiable size in bytes

* Update default location for direct download

* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope

* Verify location for temporary download exists before checking free space

* In progress - refactor and extension

* Refactor and fix

* Last fixes and marvin tests

* Remove unused test file

* Improve logging

* Change default path for direct download

* Fix upload certificate

* Fix ISO failure after retry

* Fix metalink filename mismatch error

* Fix iso direct download

* Fix for direct download ISOs on local storage and shared mount point

* Last fix iso

* Fix VM migration with ISO

* Refactor volume migration to remove secondary storage intermediate

* Fix simulator issue
2020-03-06 19:56:54 +01:00
Rohit Yadav 8ef131745a Merge branch '4.11' 2018-03-15 16:46:50 +05:30
Rafael Weingärtner c591c5ad3e CLOUDSTACK-10248: Fix errors that appeared after #2283 (#2417)
This fixes move refactoring error introduced in #2283 
For instance, the class DatadiskTO is supposed to be in com.cloud.agent.api.to package. However, the folder structure it was placed in is com.cloud.agent.api.api.to.

Skip tests for cloud-plugin-hypervisor-ovm3:
For some unknown reason, there are quite a lot of broken test cases for cloud-plugin-hypervisor-ovm3. They might have appeared after some dependency upgrade and was overlooked by the person updating them. I checked them to see if they could be fixed, but these tests are not developed in a clear and clean manner. On top of that, we do not see (at least I) people using OVM3-hypervisor with ACS. Therefore, I decided to skip them.

Identention corrected to use spaces instead of tabs in XML files
2018-01-23 12:19:36 +01:00
Marc-Aurèle Brothier 893a88d225 CLOUDSTACK-10105: Use maven standard project structure in all projects (#2283)
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.

- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml

Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
2018-01-20 03:19:27 +05:30