cloudstack

Commit Graph

Author	SHA1	Message	Date
Abhishek Kumar	5e98405b38	Merge remote-tracking branch 'apple/apple-base418' into scalability-improvements	2024-07-22 16:12:19 +05:30
Suresh Kumar Anaparti	d1faa59677	Back port fixes from upstream 4.19 (#466 ) * Fixed src datastore on copy check for PowerFlex/ScaleIO storage driver (#9310) * Ignore non-managed pools for storage pool access preparation (#9376)	2024-07-19 09:38:11 +05:30
Suresh Kumar Anaparti	5c682677fc	Support resource name / displaytext with unicode / emoji chars, and SQL exception msg improvements (#460 ) * Don't send sql exception/query from dao to upper layer, log it and send only the error message * Updated charset to utf8mb4, for display_name column/user_vm table and job_result column/async_job table to support unicode chars & emojis * Added API arg validator for RFC compliance domain name, to validate VM's host name * Updated user resources name / display name column's charset to utf8mb4 * Check and update char set for affinity group name to utf8mb4, from the data migration in upgrade path * Updated backup offering name column charset to utf8mb4 * Added unit tests for vm host/domain name validation * Added smoke test to check resource name for vm, volume, service & disk offering, template, iso, account(first/lastname) * Updated resource annotation charset to utf8mb4 * Updated some resources description charset to utf8mb4	2024-07-19 09:35:18 +05:30
Rohit Yadav	a142359784	saml: make default signature check mandatory Backport https://github.com/apache/cloudstack/pull/9357	2024-07-12 09:40:59 +05:30
Rohit Yadav	b46e4d4bbf	framework/cluster: improve cluster service and integration API service (#465 ) - mTLS implementation for cluster service communication - Listen only on the specified cluster node IP address instead of all interfaces - Validate incoming cluster service requests are from peer management servers based on the server's certificate dns name which can be through global config - ca.framework.cert.management.custom.san - Hardening of KVM command wrapper script execution - Improve API server integration port check - cloudstack-management.default: don't have JMX configuration if not needed. JMX is used for instrumentation; users who need to use it should enable it explicitly Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com> Co-authored-by: Wei Zhou <weizhou@apache.org> Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com> Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com> (cherry picked from commit `4f5561937c`) Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-07-09 09:03:40 +05:30
Vishesh	c6d35b31ca	Log stdout to a file (#399 ) * Log stdout to a file * Add logrotation * Fixup permissions in log file * Remove info logs in stdout * Change output file names Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com> * Fix logrotate config * Disable logging to stdout --------- Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>	2024-07-03 20:51:20 +05:30
Marcus Sorensen	23a0faf729	Apply upstream SAML sig check from #9219 (#463 ) Co-authored-by: Marcus Sorensen <mls@apple.com>	2024-07-01 09:33:40 +05:30
Abhishek Kumar	08246e05ed	server,test: fix resourceid for VOLUME.DETROY in restore VM (#9032 ) (#454 ) Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>	2024-06-28 12:01:55 +05:30
Suresh Kumar Anaparti	be87b1a668	FR74: Mitigation for non-scalable ScaleIO clients (#447 ) * Mitigation for non-scalable Powerflex/ScaleIO clients - Added ScaleIOSDCManager to manage SDC connections, checks clients limit, prepare and unprepare SDC on the hosts. - Added commands for prepare and unprepare storage clients to prepare/start and stop SDC service respectively on the hosts. - Introduced config 'storage.pool.connected.clients.limit' at storage level for client limits, currently support for Powerflex only. * tests issue fixed * refactor / improvements * lock with powerflex systemid while checking connections limit * updated powerflex systemid lock to hold till sdc preparation * Added custom stats support for storage pool, through listStoragePools API * code improvements, and unit tests * Update config 'storage.pool.connected.clients.limit' to dynamic, and some improvements * Stop SDC on host after migration if no volumes mapped to host * Wait for SDC to connect after scini service start, and some log improvements * Do not throw exception (log it) when SDC is not connected while revoking access for the powerflex volume * some log improvements	2024-06-27 18:47:50 +05:30
Vishesh	c2de75744e	kvm: Add support for cgroupv2 (#8252 ) (#459 ) * kvm: Add support for cgroupv2 (#8252) 1. Problem description In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation. Equation 1 shares = CPU * speed Fixes: #6744 2. Proposed changes To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion. The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS. Equation 2 shares = (VM requested shares * cgroup upper limit)/host max shares To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host. To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency. It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now. * Update overcommit ratio during live VM migration * minor refactoring --------- Co-authored-by: Bryan Lima <42067040+BryanMLima@users.noreply.github.com>	2024-06-27 12:22:17 +05:30
Vishesh	7ed43e3e43	Let network guru decide if ipv6 cidr size can't be equal to 64 (#462 )	2024-06-27 12:20:49 +05:30
Vishesh	8be18e587f	FR75 Enforce strict host tag checking (#421 ) * Enforce strict host tag checking * Add e2e tests * Add more information to error log * Fix e2e test * Update global settings descrption * fixup * Fix e2e test teardown	2024-06-25 14:38:59 +05:30
Abhishek Kumar	8f88103a29	FR72 - api,server: purge expunged resources (#405 ) This PR introduces the functionality of purging removed DB entries for CloudStack entities (currently only for VirtualMachine). There would be three mechanisms for purging removed resources: - Background task - CloudStack will run a background task which runs at a defined interval. Other parameters for this task can be controlled with new global settings. - API - New API `purgeExpungedResources`. It will allow passing the following parameters - resourcetype, batchsize, startdate, enddate - Config for service offering. Service offerings can be created with purgeresources parameter which would allow purging resources immediately on expunge. Following new global settings have been added: - `expunged.resources.purge.enabled`: Default: false. Whether to run a background task to purge the DB records of the expunged resources. - `expunged.resources.purge.resources`: Default: (empty). A comma-separated list of resource types that will be considered by the background task to purge the DB records of the expunged resources. Currently only VirtualMachine is supported. An empty value will result in considering all resource types for purging. - `expunged.resources.purge.interval`: Default: 86400. Interval (in seconds) for the background task to purge the DB records of the expunged resources. - `expunged.resources.purge.delay`: Default: 300. Initial delay (in seconds) to start the background task to purge the DB records of the expunged resources task. - `expunged.resources.purge.batch.size`: Default: 50. Batch size to be used during purging of the DB records of the expunged resources. - `expunged.resources.purge.start.time`: Default: (empty). Start time to be used by the background task to purge the DB records of the expunged resources. Use format `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss`. - `expunged.resources.purge.keep.past.days`: Default: 30. The number of days in the past from the execution time of the background task to purge the DB records of the expunged resources for which the expunged resources must not be purged. To enable purging DB records of the expunged resource till the execution of the background task, set the value to zero. - `expunged.resource.purge.job.delay`: Default: 180. Delay (in seconds) to execute the purging of the DB records of an expunged resource initiated by the configuration in the offering. Minimum value should be 180 seconds and if a lower value is set then the minimum value will be used. Upstream PRs: https://github.com/apache/cloudstack/pull/8999 https://github.com/apache/cloudstack-documentation/pull/397 Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com> Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>	2024-06-19 12:59:50 +05:30
Suresh Kumar Anaparti	04091abc0d	User data content size validation, register managed user data using POST call from UI, and related code improvements (#361 ) * Validate user data with actual length, and some code improvements * Ignore if user data is not set (don't fail) * Validate user data after finalizing it * Updated registerUserData API using POST call from UI, to support user data upto 1048576 bytes * Apply suggestions from code review * Added logs for user data * Addressed review comments * Check user data length with base64 encoded data, and some code improvements	2024-06-19 12:54:32 +05:30
Suresh Kumar Anaparti	bda0543dd0	ScaleIO volume live migration - use usable bytes from source disk to format the destination disk (#452 ) * ScaleIO volume live migration - use usable bytes from source disk to format the destination disk * Don't abort block copy job when cur,end = 0 * code improvements	2024-06-10 14:12:06 +05:30
Abhishek Kumar	256051af1d	server: fix resource reservation leakage (#456 ) * server: fix resource reservation leakage Fixes #453 Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com> * refactor Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com> * Fix resource reservation leftover entries (#455) * Resolve comments * Address comments --------- Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com> Co-authored-by: Vishesh <vishesh92@gmail.com>	2024-06-10 12:29:45 +05:30
Wei Zhou	e065c93c3f	Apple FR76: Implicit host tags (#427 ) * Merge two HostTagVO and HostTagDaoImpl * Apple FR76: dynamic host tags * Revert "Apple FR76: dynamic host tags" This reverts commit 01b93a873f167018c4fafd0744c0de07ae4de4ed. * Apple FR76: Implicit host tags * Apple FR76: address Abhishek's comments * Apple FR76: move updateImplicitTags * Apple FR76: add since to other two responses * Update 8929: add unit test in LibvirtComputingResourceTest * Update variable names * Update FR76: add explicithosttags in response * Update FR76 UI: Update explicit host tags * Update 8929: remove host tags and change labels on UI * Update: ui polish for host tags * fix since in responses * Update 8929: fix UI error if no host tags	2024-05-30 17:20:37 +05:30
Rohit Yadav	07097849d4	add more missing indexes to lower table scans Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-23 18:55:17 +05:30
Rohit Yadav	b03d1382e6	fix unit tests failures Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-23 10:23:32 +05:30
Rohit Yadav	0f44a7f900	.python-version: bump to v3.10 Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	2f0f0e9ebc	engine/schema: call initDB before creating app context bean Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	3883dbe9a0	schema: force index on user_view_view In env with large number of shared networks or ip addresses (10k+), this causes millions of table scans in user_ip_address table. This causes severe slowness in listVM APIs etc. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	f57f244863	schema: speed up network offering created table scans Using function in view was causing too many scans, as many rows as number of domains and zones. This reduces table scans where left joins happen using sub-queries. The effect is seen in bit faster create network API performance. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	c3867a941f	more fixmes and todos Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	7a7f1e2b6e	FIXME/TODO: CPU and DB hotspot found Found these CPU and DB hotspot that handle agent ping commands, this adds idle load when there are high number of hosts. By design, there isn't any quick win here. However, the power sync report/handling could be improved, so it doesn't need to kick-in for every ping command received. Few more areas marked in the codebase. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	5603bf9c1a	engine: optimise CPU and DB hotspot to return enabled hypervisors in the zone This refactors a ResourceManager::listAvailHypervisorInZone method that should return unique hypervisors for which existing hosts are Up and processed. We can approximate this by assuming that those hosts would have setup their hypervisor-specific systemvmtemplates. In a given environment there wouldn't be thousands of systemvmtemplates, but can have thousands of hosts. So, instead of scanning the entire cloud.host table, we can make calculate guess by returning unique hypervisors of systemvm templates which are ready. This method was used in ::processConnect() when an agent joins, to speed up its handling. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	8a320b807d	engine/schema: cluster dao method query optimisation Replace list.size() by doing getCount() instead. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	696927455f	framework/db: use HikariCP instead of dbcp2 Replaces dbcp2 connection pool library with more performant HikariCP. With this unit tests are failing but build is passing. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	f21a00b2de	framework/db: use lightweight-ping As per the docs, the connector-j can use /* ping */ before calling SELECT 1 to have light weight application pings to the server: https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-j2ee-concepts-connection-pooling.html Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	1c02166d29	framework/db: dont' use validation query as connector is JDBC4 compliant Per docs, if the mysql connector is JDBC2 compliant then it should use the Connection.isValid API to test a connection. (https://docs.oracle.com/javase/8/docs/api/java/sql/Connection.html#isValid-int-) This would significantly reduce query lags and API throughput, as for every SQL query one or two SELECT 1 are performed everytime a Connection is given to application logic. This should only be accepted when the driver is JDBC4 complaint. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:39 +05:30
Rohit Yadav	90afcf2f85	metrics: optimise code and query to get summed cpu sockets Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	35462dc96d	server: fix full table scanning for listHosts API The type parameter isn't keyword, but a simple listHosts API call with type=Routing, runs SELECT COUNT(*) FROM host WHERE host.type LIKE '%Routing' AND host.removed IS NULL; ... which causes an unnecessary full table scan. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	076a712fbe	schema: add indexes that save DB from too many scans Speeds up several APIs, esp host and VM listing APIs and VM deployment Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	54accfdc0a	schema: add missing index to reduce table scans Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	5750e56be5	server: improve DB optimisation, indexing and reduce table scans In this example commit, we look at: - Adding missing indexes to speed up queries - Reduce table scans by optimising sql query and using indexes - Optimising sql queries to remove duplicate rows (use of distinct) - Reduce CPU and DB load by using jprofiler to optimise both sql query and CPU hotspots server: reduce CPU and DB load caused by systemvm ::isZoneReady() For this case, the sql query was fetching large number of table scans only to determine if zone has any available pool+host to launch systemvms. Accodingly the code and sql queries along with indexes optimisations were used to lower both DB scans and mgmt server CPU load. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	3a0927a568	server: trace logs for security groups listener Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	607911562e	server: fix NPE, compare known versus unknown in equals() Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	807cd6a830	metrics: speed up list zones and cluster metrics APIs Also add a flag to disable on-the-fly metrics computation when the list metrics APIs for zones and clusters are called. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	72b841567e	ui: add disconnected hosts filter and improve admin dashboard Adds disconnected as a host filter in the UI Improve capacity dashboard for admins for large env. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	5484d3c7e6	orchestartion: optimise vm list fetching excluding that reported This optimises the sql query and iterator to simply return the VMs list excluding those in the received report. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	de82aa8e91	engine/orchestartion: wrap db txn in try-with, only fetch id Optimises DB query that seem to run against every Ping command, where whole columns are fetched but only `id` column is used. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	c01aad6ba8	server: count hosts than get all hosts in capacity scans This refactors hotspot code to fetch just the count of hosts than all the host VOs for a zone, during capacity scans for systemvms. This reduces CPU and DB load, in really large (10k+ hosts) env. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	2a48d71909	server: don't go into O(n^2) loop for non-XenServer hosts Introduced in https://github.com/apache/cloudstack/pull/1403 this gates the logic only to XenServer where this would at all run. The specific code is only applicable for XenServer and SolidFire (https://youtu.be/YQ3pBeL-WaA?si=ed_gT_A8lZYJiEh. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:22:38 +05:30
Rohit Yadav	47163df2ff	framework/config: make logic in ::value() defensive (#449 ) This adds a NPE check on the s_depot.global() which can cause NPE in case of unit tests, where s_depot is not null but the underlying config dao is null (not mocked or initialised) via `s_depot.global()` becomes null. Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>	2024-05-22 20:20:37 +05:30
Vishesh	c3eba5e213	Fix exceeding of resource limits with powerflex (#443 ) * Fix exceeding of resource limits with powerflex * Fix for volume prepare during VM start * resolve comments * Add e2e tests * Fixup * Update e2e tests * minor refactoring * refactoring * fixup --------- Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>	2024-05-08 20:54:54 +05:30
Vishesh	2f4cea6dca	Fix message publish in transaction (#438 ) * Fix message publish in transaction * Resolve comments	2024-05-07 13:27:19 +05:30
Vishesh	04a589d013	Fixup e2e test_restore_vm (#445 ) * Fixup e2e test_restore_vm * Fix template's size attribute * Resolve comments	2024-05-07 12:59:42 +05:30
Vishesh	7fae1fc747	Fix restore VM with allocated root disk (#441 ) * Fix restore VM with allocated root disk * Add e2e test for restore vm * Add more checks for e2e test	2024-04-29 12:18:55 +05:30
Vishesh	9ab786c18a	Fix: Update rootdisksize detail on restore VM (#440 ) * Fix: Update rootdisksize detail on restore VM * Update server/src/main/java/com/cloud/vm/UserVmManagerImpl.java Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com> * minor fixup --------- Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>	2024-04-29 12:14:44 +05:30
Vishesh	1b54edd9de	Fix resource limit checks and increment/decrements for different operations (#430 ) * Fix resource limit checks and increment/decrements for different operations * Fixup * More fixups * fixup * Refactor code * Resolve comments * Some minor code refactoring * Fixup * fixup * Fix method name * Fixup * Fixup listing	2024-04-24 17:56:33 +05:30

1 2 3 4 5 ...

36123 Commits All Branches Search

36123 Commits

All Branches