Commit Graph

23 Commits

Author SHA1 Message Date
Abhishek Kumar 0b5a5e8043
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements

Following changes and improvements have been added:

- Improvements in handling of PingRoutingCommand

    1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
    2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
    3. Optimized scanning stalled VMs

- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`

- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine

- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.

- Added caching for account/use role API access with expiration after write set to 60 seconds.

- Added caching for some recurring DB retrievals

    1. CapacityManager - listing service offerings - beneficial in host capacity calculation
    2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
    3. DownloadListener - hypervisors for zone - beneficial for host joins
    5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands

- Optimized MS list retrieval for agent connect

- Optimize finding ready systemvm template for zone

- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks

- Changes in agent-agentmanager connection with NIO client-server classes

    1. Optimized the use of the executor service
    2. Refactore Agent class to better handle connections.
    3. Do SSL handshakes within worker threads
    5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
    6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.

- Improvements in StatsCollection - minimize DB retrievals.

- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.

- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.

- Minor improvements in resource limit calculations wrt DB retrievals

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* test1, domaindetails, capacitymanager fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* test2 - agent tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* capacitymanagertest fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* change

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix missing changes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert marvin/setup.py

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix indent

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* use space in sql

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address duplicate

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* update host logs

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert e36c6a5d07

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix npe in capacity calculation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* move schema changes to 4.20.1 upgrade

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix build

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add some more tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* checkstyle fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* remove unnecessary mocks

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* replace statics

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* engine/orchestration,utils: limit number of concurrent new agent
connections

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactor - remove unused

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unregister closed connections, monitor & cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add check for outdated vm filter in power sync

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* agent: synchronize sendRequest wait

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-01 12:28:41 +05:30
Vishesh a4224e58cc
Improve logging to include more identifiable information (#9873)
* Improve logging to include more identifiable information for kvm plugin

* Update logging for scaleio plugin

* Improve logging to include more identifiable information for default volume storage plugin

* Improve logging to include more identifiable information for agent managers

* Improve logging to include more identifiable information for Listeners

* Replace ids with objects or uuids


* Improve logging to include more identifiable information for engine

* Improve logging to include more identifiable information for server

* Fixups in engine

* Improve logging to include more identifiable information for plugins

* Improve logging to include more identifiable information for Cmd classes

* Fix toString method for StorageFilterTO.java
2025-01-06 16:42:37 +05:30
John Bampton c923e673cf
pre-commit: add `XML` files to the `trailing-whitespace` check (#9131) 2024-07-12 09:42:54 +02:00
João Jandre 49cecaed06
Normalize loggers and upgrade log4j 1.2 to log4j 2.19 (#7131)
* Normalize logs

All classes that could have their loggers inherited from their fathers had their own loggers deleted;
Most loggers didn't have to be static, so most of them were normalized so that they wouldn't be;
All loggers are protected now;
Static logger's name are now 'LOGGER';
Non-static logger's name are now 'logger';
New class DbUpgradeAbstractImpl created so that all Upgraders extend it and inherit its logger

* Upgrade log4j

* fix errors caused by the merge

* Refactor cglibThrowableRenderer functionality to log4j2 and upgrade the last configuration files

* fix sonarcloud bug

* Fix errors caused by merge, remove some unused loggers, and rename a variable that was mistakenly renamed on the normalization commit

* Readd snmpTrapAppender, remove TestAppender

* Regenerate changes

* regenerate changes

* refactor last custom appender

* fix systemvm configuration xml

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Fix utils pom

* fix some tests

* regenerate changes

* Fix jar being printed on exception

* fix logging in system VMs, fix commands not having log4j2 classpath.

* regenerate changes

* Fix some unwanted renomeations

* fix end of file

* regenerate changes

* regenerate changes

* fix merge error

* regenerate changes

* fix tests

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* readd reload4j to tungsten as juniper depends on it

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* re-add reload4j dependency to network-contrail, as juniper depends on it

* regenerate changes

* regenerate changes

* regenerate changes

* fix typo

* regenerate changes

* regenerate changes

* Fix end of files

* regenerate changes

* add logj42 to cloud-utils-SHADED.jar

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* fix some tests

* Regenerate changes

* Regenerate changes

* fix test

* Regenerate changes

* Regenerate changes
2024-02-08 09:55:41 -03:00
Vishesh ea90848429
Feature: Add support for DRS in a Cluster (#7723)
This pull request (PR) implements a Distributed Resource Scheduler (DRS) for a CloudStack cluster. The primary objective of this feature is to enable automatic resource optimization and workload balancing within the cluster by live migrating the VMs as per configuration.
Administrators can also execute DRS manually for a cluster, using the UI or the API.
Adds support for two algorithms - condensed & balanced. Algorithms are pluggable allowing ACS Administrators to have customized control over scheduling.

Implementation
There are three top level components:

    Scheduler
    A timer task which:

    Generate DRS plan for clusters
    Process DRS plan
    Remove old DRS plan records

    DRS Execution
    We go through each VM in the cluster and use the specified algorithm to check if DRS is required and to calculate cost, benefit & improvement of migrating that VM to another host in the cluster. On the basis of cost, benefit & improvement, the best migration is selected for the current iteration and the VM is migrated. The maximum number of iterations (live migrations) possible on the cluster is defined by drs.iterations which is defined as a percentage (as a value between 0 and 1) of total number of workloads.

    Algorithm
    Every algorithms implements two methods:
        needsDrs - to check if drs is required for cluster
        getMetrics - to calculate cost, benefit & improvement of a migrating a VM to another host.

Algorithms

    Condensed - Packs all the VMs on minimum number of hosts in the cluster.
    Balanced - Distributes the VMs evenly across hosts in the cluster.
    Algorithms use drs.level to decide the amount of imbalance to allow in the cluster.

APIs Added

listClusterDrsPlan

    id - ID of the DRS plan to list
    clusterid - to list plans for a cluster id

generateClusterDrsPlan

    id - cluster id
    iterations - The maximum number of iterations in a DRS job defined as a percentage (as a value between 0 and 1) of total number of workloads. Defaults to value of cluster's drs.iterations setting.

executeClusterDrsPlan

    id - ID of the cluster for which DRS plan is to be executed.
    migrateto - This parameter specifies the mapping between a vm and a host to migrate that VM. Format of this parameter: migrateto[vm-index].vm=<uuid>&migrateto[vm-index].host=<uuid>.

Config Keys Added

    ClusterDrsPlanExpireInterval
    Key drs.plan.expire.interval
    Scope Global
    Default Value 30 days
    Description The interval in days after which old DRS records will be cleaned up.

    ClusterDrsEnabled
    Key drs.automatic.enable
    Scope Cluster
    Default Value false
    Description Enable/disable automatic DRS on a cluster.

    ClusterDrsInterval
    Key drs.automatic.interval
    Scope Cluster
    Default Value 60 minutes
    Description The interval in minutes after which a periodic background thread will schedule DRS for a cluster.

    ClusterDrsIterations
    Key drs.max.migrations
    Scope Cluster
    Default Value 50
    Description Maximum number of live migrations in a DRS execution.

    ClusterDrsAlgorithm
    Key drs.algorithm
    Scope Cluster
    Default Value condensed
    Description DRS algorithm to execute on the cluster. This PR implements two algorithms - balanced & condensed.

    ClusterDrsLevel
    Key drs.imbalance
    Scope Cluster
    Default Value 0.5
    Description Percentage (as a value between 0.0 and 1.0) of imbalance allowed in the cluster. 1.0 means no imbalance
    is allowed and 0.0 means imbalance is allowed.

    ClusterDrsMetric
    Key drs.imbalance.metric
    Scope Cluster
    Default Value memory
    Description The cluster imbalance metric to use when checking the drs.imbalance.threshold. Possible values are memory and cpu.
2023-10-26 11:48:18 +05:30
John Bampton 6f4503488b
pre-commit: apply `end-of-file-fixer` to all files (#7551) 2023-08-02 13:47:21 +02:00
dahn 0f1cd6009d
add logging to deployment planners (#5859)
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>

Co-authored-by: Daan Hoogland <dahn@onecht.net>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
2022-02-04 17:02:32 +01:00
Marc-Aurèle Brothier 893a88d225 CLOUDSTACK-10105: Use maven standard project structure in all projects (#2283)
Remove maven standard module (which only a few were using) and get ride of maven customization for the projects structure.

- moved all directories to src/main/java, src/main/resources, src/main/scripts, src/test/java, src/test/resources
- grep scan to search for src/com and src/org left over
- grep for <project>/scripts to fix pom.xml configuration
- remove custom <build> configuration in pom.xml

Signed-off-by: Marc-Aurèle Brothier <m@brothier.org>
2018-01-20 03:19:27 +05:30
cirstofolini 1a64c247ad Removed unnecessary @Local annotations and their respective imports from the ComponentLifecycleBase class and its subclasses. 2015-11-21 18:31:11 -02:00
Rajani Karuturi 8bc0294014 Revert "Merge pull request #714 from rafaelweingartner/master-lrg-cs-hackday-003"
This reverts commit cd7218e241, reversing
changes made to f5a7395cc2.

Reason for Revert:

noredist build failed with the below error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project cloud-plugin-hypervisor-vmware: Compilation failure
[ERROR] /home/jenkins/acs/workspace/build-master-noredist/plugins/hypervisors/vmware/src/com/cloud/hypervisor/guru/VMwareGuru.java:[484,12] error: non-static variable logger cannot be referenced from a static context
[ERROR] -> [Help 1]

even the normal build is broken as reported by @koushik-das on dev list
http://markmail.org/message/nngimssuzkj5gpbz
2015-08-31 11:27:57 +05:30
Rafael Weingartner 3818257a68 Solved jira ticket: CLOUDSTACK-8750 2015-08-28 22:35:08 -03:00
Alena Prokharchyk 6c23e201ad 1) More fixes for the problems found by findBugs
2) Corrected some logging in  MidoNetPublicNetworkGuru - removed .toString method call on the objects in the log body as toString is called on the object by default when use log4j
2014-03-13 16:05:45 -07:00
Alex Huang d620df2bdd Reformatted all of the code. 2013-11-21 06:15:26 -08:00
Alex Huang 8d62744681 Reformat all source code. Added checkstyle to check the source code 2013-11-20 07:26:53 -08:00
Darren Shepherd f62e28c1ec New Transaction API
Introduction of a new Transaction API that is more consistent with the style
of Spring's transaction managment.  The existing Transaction class was renamed
to TransactionLegacy.  All of the non-DAO code in the management server has been
updated to use the new Transaction API.
2013-10-16 09:21:00 -07:00
Prachi Damle f31c318158 Changes required to merge to master:
- Replace UserContext by CallContext
2013-09-03 20:03:11 -07:00
Prachi Damle a0b48a45c0 CLOUDSTACK-4222: [VMWare] NPE: VM Failed to start after Volume Migration
- ExplicitDedicationProcessor should process only if group of this type is used!
2013-09-03 20:03:05 -07:00
Prachi Damle 25cc9eb869 CLOUDSTACK-4234:Dedicated Resources: When multiple dedication groups are chosen for VM deployment, dedicated resources belonging to both groups should be considered
Changes:
- Do not add the dedicated resource to avoid list if it is present in the list of resources to consider for the deployment.
2013-09-03 20:02:50 -07:00
Prachi Damle 5628153c59 CLOUDSTACK-4221: Dedicated Resources: changes to associate the dedicated resource with the 'ExplicitDedication' affinity group
Changes:
- Adding mocks in unit tests for new injected components

Conflicts:

	server/test/org/apache/cloudstack/networkoffering/ChildTestConfiguration.java
2013-09-03 20:02:44 -07:00
Prachi Damle ef22b42b38 CLOUDSTACK-4221: Dedicated Resources: changes to associate the dedicated resource with the 'ExplicitDedication' affinity group
Changes:
- Implict creation of the 'ExplicitDedication' Affinity group during resource dedication
- Only one group per account or per domain will be present
- ListDedicatedResources by affinityGroup
- Deployment should consider dedicated resources associated to the group only
- Deleting affinity group should release the dedicated resouces
- Releasing the dedicated resources should remove the group associated if there are no more resources.

Conflicts:

	plugins/dedicated-resources/src/org/apache/cloudstack/dedicated/DedicatedResourceManagerImpl.java
	plugins/dedicated-resources/test/org/apache/cloudstack/dedicated/manager/DedicatedApiUnitTest.java
	server/src/com/cloud/configuration/ConfigurationManagerImpl.java
2013-09-03 20:02:38 -07:00
Prachi Damle a06bd9fa2b CLOUDSTACK-4168 Root Admin should be able to create 'ExplicitDedication' affinity group at domain level and make it available for all accounts in the domain
Changes:
- 'ExcplicitDedication' type of group can be created/deleted by Root admin only
- Users can no longer create this type of affinity group
- RootAdmin can create this type of affinitygroup at domain level. Such a domain level group is available for all accounts in that domain for listing and for use during deployVM.
- The domain level affinitygroup should be visible to the users in that domain, domain admins and Root admin.

Conflicts:

	server/src/com/cloud/api/query/QueryManagerImpl.java
	server/src/org/apache/cloudstack/affinity/AffinityGroupServiceImpl.java
	server/test/org/apache/cloudstack/affinity/AffinityApiUnitTest.java
2013-09-03 20:02:34 -07:00
Alex Huang 1325014a03 Changed VirtualMachineProfile to be non-generic. From here on VirtualMachineManager will only manage vm instance. It doesn't understand the difference between different types of VMs. This makes the vmsync code to be generic across all vms. 2013-07-22 11:48:11 -07:00
Saksham Srivastava 17267794ad CLOUDSTACK-681: Dedicated Resources - Explicit Dedication, Private zone, pod, cluster or host. <Patch1>
This feature allows a user to deploy VMs only in the resources dedicated to his account or domain.

1. Resources(Zones, Pods, Clusters or hosts) can be dedicated to an account or domain.
   Implemented 12 new APIs to dedicate/list/release resources:
   - dedicateZone, listDedicatedZones, releaseDedicatedZone for a Zone.
   - dedicatePod, listDedicatedPods, releaseDedicatedPod for a Pod.
   - dedicateCluster, listDedicatedClusters, releaseDedicatedCluster for a Cluster
   - dedicateHost, listDedicatedHosts, releaseDedicatedHost for a Host.
2. Once a resource(eg. pod) is dedicated to an account, other resources(eg. clusters/hosts) inside that cannot be further dedicated.
3. Once a resource is dedicated to a domain, other resources inside that can be further dedicated to its sub-domain or account.
4. If any resource (eg.cluster) is dedicated to a account/domain, then resources(eg. Pod) above that cannot be dedicated to different accounts/domain (not belonging to the same domain)
5. To use Explicit dedication, user needs to create an Affinity Group of type 'ExplicitDedication'
6. A VM can be deployed with the above affinity group parameter as an input.
7. A new ExplicitDedicationProcessor has been added which will process the affinity group of type 'Explicit Dedication' for a deployment of a VM that demands dedicated resources.
   This processor implements the AffinityGroupProcessor adapter. This processor will update the avoid list.
8. A VM requesting dedication will be deployed on dedicatd resources if available with the user account.
9. A VM requesting dedication can also be deployed on the dedicated resources available with the parent domains iff no dedicated resources are available with the current user's account or
   domain.
10. A VM (without dedication) can be deployed on shared host but not on dedicated hosts.
11. To modify the dedication, the resource has to be released first.
12. Existing Private zone functionality has been redirected to Explicit dedication of zones.
13. Updated the db upgrade schema script. A new table "dedicated_resources" has been added.
14. Added the right permissions in commands.properties
15. Unit tests:  For the new APIs and Service, added unit tests under : plugins/dedicated-resources/test/org/apache/cloudstack/dedicated/DedicatedApiUnitTest.java
16. Marvin Test: To dedicate host, create affinity group, deploy-vm, check if vm is deployed on the dedicated host.
2013-05-30 01:07:01 -07:00