* forceha: fix vm is not started if it is poweroff from inside
steps to reproduce the issue
(1) make sure force.ha is true in global setting. if not, change it to true, and restart mgt server
(2) create a service offering , ha is not enabled
(3) create a vm
(4) log into the vm, and power off via cli.
expected result: vm is started again by cloudstack
actual result: vm is not started.
* forceha: fix vms are still running if host is force-removed
when host can be force removed, however vms are stopped in cloudstack, but not stopped on host
```
(localcloud) 🐱 > delete host id="a5625393-444d-4d0a-b31d-62baf88a8be1" forced=true
{
"success": true
}```
after some minutes, vms are still runnning on host
```
root@mgt01:~# ssh node63 virsh list
Id Name State
---------------------------
1 i-2-19-VM running
2 i-2-11-VM running
```
error message are
```
Cannot transmit host 2 to Enabled state
com.cloud.utils.fsm.NoTransitionException: No next resource state found for current state = Enabled event = DeleteHost
at com.cloud.resource.ResourceManagerImpl.resourceStateTransitTo(ResourceManagerImpl.java:1216)
at com.cloud.resource.ResourceManagerImpl$1.doInTransactionWithoutResult(ResourceManagerImpl.java:907)
```
* forceha: Make ForceHA dynamic
If VM details contain rootdisksize, volume entry in DB should reflect correct size when VM reset is performed.
Fixes#3957
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
Fixes#4517
Adds capacity checks for RandomAllocator (host allocator)
Factors out host cpu capability and capacity check wrt serviceoffering code into CapacityManager.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* server: fix failed to remove template/iso if upload from local fails
When upload template/iso/volume from local fails, the install_path will not be a full path of file so removing it will fail.
```
mysql> select install_path from template_store_ref;
+--------------------------------------------------------------------+
| install_path |
+--------------------------------------------------------------------+
| template/tmpl/1/3/805f4763-248e-40ec-b79a-b868cc480d0a.qcow2 |
| template/tmpl/1/4/c7e32c9e-5e72-3726-85cf-aa5ccd84118d.qcow2 |
| template/tmpl/2/201/bc4f4f08-138a-31b8-af1a-d4450eff7982.qcow2 |
| template/tmpl/2/202 |
| template/tmpl/2/203/203-2-d47f8cde-a2a8-31e7-a826-2628ad98a6c8.iso |
| template/tmpl/2/204 |
| template/tmpl/5/205 |
| template/tmpl/2/206 |
| template/tmpl/2/207 |
| template/tmpl/2/208 |
| template/tmpl/2/209 |
| template/tmpl/2/210 |
+--------------------------------------------------------------------+
12 rows in set (0.00 sec)
mysql> select install_path from volume_store_ref;
+---------------------------------------------------------+
| install_path |
+---------------------------------------------------------+
| volumes/2/22 |
| volumes/2/19/f93face9-6521-4184-b89a-cb07f86bbae8.qcow2 |
| volumes/2/23 |
| volumes/2/24 |
+---------------------------------------------------------+
4 rows in set (0.00 sec)
```
* server: disallow removing template/iso in NotUpload and UploadInProgress state
* Prevent KVM from performing volume migrations of running instances
KVM has a limitation to modify instances definitions while they are on running state. Therefore, it is not possible to change volumes backend location easily.
There is a problem in the `migrateVolume` API. This API command ignores that limitation and causes an inconsistence on the database. ACS processes the migrate command, copies the volume to the destination storage, modifies the database and finishes the process with success. However, the running backend is still using the "old volume file".
This PR intends to prevent KVM to perform volumes migrations while KVM instances are in the running state and inform the user of an alternative API command that enables such operation on running instances.
* Update VolumeApiServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
Co-authored-by: Rohit Yadav <rohit@apache.org>
Steps to reproduce the issue:
(1)Create 10000 service offerings (by db changes below or cloudmonkey).
```
DROP PROCEDURE IF EXISTS cloud.insert_service_offering;
DELIMITER $$
CREATE PROCEDURE cloud.insert_service_offering()
BEGIN
DECLARE count INT DEFAULT 10000;
SET @offeringid = (select max(id)+1 from disk_offering);
WHILE count > 0 DO
INSERT INTO disk_offering (id,name,uuid,display_text,disk_size,type,created) values (@offeringid,'test-offering-wei',uuid(), 'test-offering-wei',0,'Service',now());
INSERT INTO service_offering (id,cpu,speed,ram_size) values (@offeringid, 1, 500,256);
SET @offeringid = @offeringid + 1;
SET count = count - 1;
END WHILE;
END $$
DELIMITER ;
CALL cloud.insert_service_offering();
mysql> CALL cloud.insert_service_offering();
Query OK, 0 rows affected (2 min 30.85 sec)
```
(2) Check the total time of periodical capacity check in cloudstack.
Without this patch, it spend 2.5 seconds (2 hosts)
```
2021-01-15 16:10:12,793 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Running Capacity Checker ...
2021-01-15 16:10:15,287 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Done running Capacity Checker ...
```
With this patch ,it spend 1.3 seconds (2 hosts)
```
2021-01-15 16:12:43,604 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Running Capacity Checker ...
2021-01-15 16:12:44,927 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Done running Capacity Checker ...
```
If there are 100 hosts, the total time will be reduced from 100+ seconds to around 10 seconds.
We can use cloudmonkey to scale a vm with dynamic offering, to same offering but with different cpunumber or memory.
Enable it on UI to improve user experience.
This PR fixes an issue when move a vm from an account to another account.
Steps to reproduce the issue
(1) create a vm with multiple shared networks (in advanced zone, or advanced zone with security groups)
(2) create another account (in same domain who can also access the shared networks)
(3) move vm to new account, with a list of networkid
expected result: the vm has nics on the networks in same order as specified in API request, and nics have the same ips as before actual result: network order is not same as specified, ips are changed.
* server: fix cannot create vm if another vm with same name has been added and removed on the network
steps to reproduce the issue
(1) create vm-1 on network-1
(2) add vm-1 to network-2
(3) remove vm-1 from network-2
(4) create another vm with same name vm-1 on network-2
expected result: operation succeed
actual result: operation failed.
* #4600: add back a removed line
* vpc: fix ips on wrong interfaces after rebooting vpc vrs
* #4467: Rename to updateNicWithDeviceId
* CLSTACK-8923 vr: Force a restart of keepalived if conntrackd is not running or configuration has changed
When we try to reset the site 2 site vpn connection while
the VR's are being restarted, the connection enters the
PENDING state and we cant reset the connection.
So change the state from PENDING to disconnected.
Steps to reproduce the issue
1.create a VPC with a tier (vpc-001-001 in vpc-001), create a vm
2.create a VPC with a tier (vpc-002-001 in vpc-002) with different cidr, create a vm
3.create custom gateway for both vpn
4.enable site-to-site vpn on both vpn, and add vpn connection to each other. both should be "Connected"
5.restart vpc-001 with cleanup and monitor it
6.when the first router is destroyed, go to site-to-site vpn page and reset vpn connection.
7.we will get an error "Resource [DataCenter:1] is unreachable: Unable to apply site 2 site VPN configuration, virtual router is not in the right state"
and vpn connection is stuck at Pending
8.When vpc is restarted, go to site-to-site vpn page and reset vpn connection.
Co-authored-by: Rakesh Venkatesh <r.venkatesh@global.leaseweb.com>
this contains other changes
(1) add isrouting field for vm templates on UI
(2) show register URL of template/iso on UI
(3) set 'Bootable' field to changable for existing ISO
If vm has last host_id specified, cloudstack will try to start vm on it at first.
However, host tag is checked, but guest os preference is not checked.
for new vm, it will be deployed to the preferred host as we expect.
Fixes: #3554 (comment)
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):
2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
...
at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.
Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().