Commit Graph

10208 Commits

Author SHA1 Message Date
Nicolas Vazquez 13c81a8ee4 server: Prevent corner case for infinite PrepareForMaintenance (#3095)
A corner case was found on 4.11.2 for #2493 leading to an infinite loop in state PrepareForMaintenance

To prevent such cases, in which failed migrations are detected but still running on the host, this feature adds a new cluster setting host.maintenance.retries which is the number of retries before marking the host as ErrorInMaintenance if migration errors persist.

How Has This Been Tested?
- 2 KVM hosts, pick one which has running VMs as H
- Block migrations ports on H to simulate failures on migrations:
iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 49152:49215 -m comment --comment 'test block migrations' iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 16509 -m comment --comment 'test block migrations
- Put host H in Maintenance
- Observe that host is indefinitely in PrepareForMaintenance state (after this fix it goes into ErrorInMaintenance after retrying host.maintenance.retries times)
2018-12-28 15:14:16 +05:30
Anurag Awasthi a7ccbdc790 api: allow keyword search in listSSHKeyPairs (#2920) (#3098)
Adds support for keyword search that was ignored by listsshkeypairs command.

Fixes: #2920
2018-12-23 00:34:53 +05:30
Craig Squire 8d53557ba7 api: don't throttle api discovery for listApis command (#2894)
Users reported that they weren't getting all apis listed in cloudmonkey when running a sync. After some debugging, I found that the problem is that the ApiDiscoveryService is calling ApiRateLimitServiceImpl.checkAccess(), so the results of the listApis command are being truncated because Cloudstack believes the user has exceeded their API throttling rate.

I enabled throttling with a 25 request per second limit. I then created a test role with only list* permissions and assigned it to a test user. When this user calls listApis, they will typically receive anywhere from 15-18 results. Checking the logs, you see The given user has reached his/her account api limit, please retry after 218 ms..

I raised the limit to 200 requests per second, restarted the management server and tried again. This time I got 143 results and no log messages about the user being throttled.
2018-12-12 23:55:32 +05:30
Boris Stoyanov - a.k.a Bobby 44bc516609 api: move ostypeid from DB id to DB uuid, backports #2528 (#3066)
This is a backport to 4.11 of #2528
2018-11-29 22:20:51 +05:30
Paul Angus fb80e51307 Updating pom.xml version numbers for release 4.11.3.0-SNAPSHOT
Signed-off-by: Paul Angus <paul.angus@shapeblue.com>
2018-11-20 13:11:52 +00:00
Nicolas Vazquez bb7493ad4b configdrive: Add missing ConfigDrive entries on existing zones after upgrade (#3007)
After upgrade existing environments to 4.11, ConfigDrive cannot be enabled for existing zones due to missing entry on 'physical_network_service_providers' table.
2018-11-12 11:30:00 +05:30
Nicolas Vazquez 7d8eb37924 [4.11] Fix set initial reservation on public IP ranges (#2980)
* Fix initial reservation on public IP ranges

* Do not allow dedicating a system VM IP range
2018-11-07 10:48:07 -02:00
Nicolas Vazquez af0c1e48cf Fix DirectNetworkGuru canHandle checks for lowercase isolation methods (#3010) 2018-11-07 09:53:01 -02:00
Nicolas Vazquez dffb430975 kvm: Fix migrating VM from ISO failures (#2928)
Prevents errors while migrating VM from ISO:

Test 1: Deploy VM from ISO -> Live migrate VM to another host -> ERROR
Test 2: Register ISO using Direct Download on KVM -> Deploy VM from ISO -> Live migrate VM to another host -> ERROR

- Prevent NullPointerException migrating VM from ISO
- Prevent mount secondary storage on ISO direct downloads on KVM
2018-10-29 16:14:20 +05:30
Rohit Yadav e2ba934c19
server: fix unwanted txn commit warning messages (#2927)
This fixes unwanted transaction commit warning messages such:

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-29 02:49:54 +05:30
Rohit Yadav 9cf57d2568
network: on rolling restart force stop old routers (#2926)
This force stops old VRs when performing rolling restart with
cleanup=true. This will ensure that VRs are powered off quickly than
wait longer for the normal ACPI shutdown. During testing, it was found
on VMware where VM stops are slow compared to XenServer and KVM.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-25 09:20:39 +05:30
Nicolas Vazquez 5cf163d888 server: Unify templates/ISOs checksum API output (#2911)
Unify checksum API output for templates and ISOs: not list the checksum algorithm on:
KVM direct downloads

On in progress normal template downloads. The algorithm is shown on the listtemplates API, but after it is downloaded it is not shown anymore.
2018-10-21 22:33:04 +05:30
Rohit Yadav 5ce14df31f
network: Allow ability to disable rolling restart feature (#2900)
This adds a global setting for admins who may not want the rolling
restart of routers or are seeing any issues around it. In future, this
setting may be removed.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-17 20:27:08 +05:30
Nicolas Vazquez 9003c7bfdc Add checksum sanity validation on template registration (#2902)
* Add checksum sanity validation on template registration

* Refactor

* Rename checksum sanity method
2018-10-16 10:21:20 -03:00
Rohit Yadav ea771cfda4
router: Fixes #2719 program VR nics by device id order for VPC (#2888)
This fixes #2719 where private gateway IP might be incorrectly
programmed on a guest network nic. The VR would now check ipassoc
requests by mac addresses than provided nic/device id in case they are
wrong.

The root cause is that the device id information is lost when aggregated
commands are created upon starting of a new VPC VR, without the correct
device id in ip_associations json it mis-programs the VR.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-10 15:20:36 +05:30
Frank Maximus 02e2825d2d CLOUDSTACK-10380: Fix startvm giving another password after password reset. 2018-09-17 16:33:35 +02:00
Rohit Yadav 2ab3976c0d
CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated (#2829)
* CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated

Storage pool checker is not being called on resize and migrate volume.
This may lead to allocated percentage of storage above 100%.

Setup:
1 VMware cluster with 2 Hosts.

Executed Steps:

Applied the following global settings:
storage.overprovisioning.factor = 1
pool.storage.allocated.capacity.disablethreshold = 1
pool.storage.capacity.disablethreshold = 1
Restarted management server
Executed Resize and migrate pool and Observed that Storage pool checker is not performed on resizeVolume and migrateVolume.
Result:
Root cause analysis shows storage pool checker is not called when doing migration and resizing.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-09-07 22:01:16 +05:30
cl-k-takahashi 2c3424b478 server: fix a typo in UserVmManagerImpl.java (#2811)
Fixes typo presnt -> present

Signed-off-by: Kai Takahashi <k-takahashi@creationline.com>
2018-08-17 15:05:27 +05:30
Rohit Yadav 461c4ad027
vmware: reboot VR after mac updates (#2794)
This re-introduces the rebooting of VR after setup of nics/macs in
case of VMware. It also adds a minor enhancement to show the console
esp. for root admins when VRs and systemvms are in starting state.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-08-10 13:07:11 +05:30
Rohit Yadav f60f3cec34
router: Fixes #2789 fix proper mark based packet routing across interfaces (#2791)
Previously, the ethernet device index was used as rt_table index and
packet marking id/integer. With eth0 that is sometimes used as link-local
interface, the rt_table index `0` would fail as `0` is already defined
as a catchall (unspecified). The fwmarking on packets on eth0 with 0x0
would also fail. This fixes the routing issues, by adding 100 to the
ethernet device index so the value is a non-zero, for example then the
relationship between rt_table index and ethernet would be like:

100 -> Table_eth0 -> eth0 -> fwmark 100 or 0x64
101 -> Table_eth1 -> eth1 -> fwmark 101 or 0x65
102 -> Table_eth2 -> eth2 -> fwmark 102 or 0x66

This would maintain the legacy design of routing based on packet mark
and appropriate routing table rules per table/ids. This also fixes a
minor NPE issue around listing of snapshots.

This also backports fixes to smoketests from master.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-08-08 12:05:42 +05:30
dahn 38d0274eb4
check volumes for state when retrieving pool for configDrive creation (#2709)
* only ask for the root volume, removing extensive query

* better name
2018-07-18 13:13:41 +02:00
Khosrow Moossavi 67860d9f46 maven: Updating pom.xml version numbers for release 4.11.2.0-SNAPSHOT (#2728)
Fixes the version in pom etc. to be consistent with versioning pattern as X.Y.Z.0-SNAPSHOT after a minor release.

Signed-off-by: Khosrow Moossavi <khos2ow@gmail.com>
2018-07-06 17:27:12 +05:30
Paul Angus 8ba318da19 Updating pom.xml version numbers for release 4.11.2-SNAPSHOT
Signed-off-by: Paul Angus <paul.angus@shapeblue.com>
2018-06-26 17:53:54 +01:00
Paul Angus 2cb2dacbe7 Updating pom.xml version numbers for release 4.11.1.0
Signed-off-by: Paul Angus <paulangus@PA-Ansible-GUI.sblab.local>
2018-06-21 15:52:43 +01:00
dahn 52b02de43f vpc: reuse private gateway ip for non redundant VPC (#2712)
As rolling restart does not deallocate an IP before configuring it on a new VR, the code must allow it to be reused on a non-redundant VPCs gateway nic.
In crease ping counts to reduce intermittent failures in smoketests.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-06-21 15:06:50 +05:30
Nicolas Vazquez 539d7e10f3
Merge pull request #2493 from shapeblue/fixmaintenance
CLOUDSTACK-10326: Prevent hosts fall into Maintenance when there are running VMs on it
2018-06-20 12:00:58 -03:00
Rohit Yadav 39471c8c00
configdrive: make fewer mountpoints on hosts (#2716)
This ensure that fewer mount points are made on hosts for either
primary storagepools or secondary storagepools.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-06-20 12:25:16 +05:30
Daan Hoogland d126cd21ea comply with api key constraint 2018-06-13 16:45:30 +02:00
nvazquez faf2a7760d Add unit tests 2018-06-12 11:56:41 -03:00
nvazquez a22ab69bb6 Set host into ErrorInMaintenance in case of failure trying to enter Maintenance mode 2018-06-12 09:42:09 -03:00
nvazquez 08a8330633 CLOUDSTACK-10326: Fix for infinite loop on PrepareForMaintenance 2018-06-11 09:53:21 -03:00
nvazquez cc35f9ddb0 CLOUDSTACK-10326: Prevent hosts fall into Maintenance when there are running VMs on it 2018-06-11 09:53:20 -03:00
Frank Maximus 68d87d8f2a CLOUDSTACK-10381: Fix password reset / reset ssh key with ConfigDrive 2018-06-08 18:41:47 +02:00
Nicolas Vazquez a5856a6447 network: allow advanced zones with security groups and VXLAN isolation type (#2693)
Not possible to deploy an Advanced zone with Security Groups, and VXLAN isolation method on KVM. Exception: "Unable to convert network offering with specified id to network profile" is logged.
2018-06-08 13:13:25 +05:30
Nicolas Vazquez 76367db8fb L2: add default L2 network offerings (#2683)
Adds default L2 network offerings. Adds check for existing default L2 networks.
2018-06-07 11:23:35 +05:30
Frank Maximus 8798014ca8 CLOUDSTACK-10377: Fix Network restart for Nuage (#2672)
Changes in PR #2508 have caused network restart to fail in a Nuage setup,
as the new VR takes the same IP as the old one, and the old VR is still running.
Nuage doesn't support multiple VM's having the same IP.
We delay provisioning the interfaces in VSD until the old VR interface is released.
2018-06-06 12:17:10 +05:30
Rafael Weingärtner 9b83337658 Create unit test cases for 'ConfigDriveBuilder' class (#2674)
* Create unit test cases for 'ConfigDriveBuilder' class

* add method 'getProgramToGenerateIso' as suggested by rohit and Daan

* fix encoding for base64 to StandardCharsets.US_ASCII

* fix MockServerTest.testIsMockServerCanUpgradeConnectionToSsl()

This is another method that is causing Jenkins to fail for almost a month
2018-06-04 13:20:09 +02:00
dahn 7a3a882d12 server: Fixes #2545 revert dedicate vlan code removal (#2664)
This re-adds logic to allow dedication of public ip/range to a domain and its usage.
2018-05-23 20:40:34 +05:30
Rohit Yadav ebb22a4818 server: Calculate fresh capacity per VM (#2663)
This fixes and ensures that every VM has its capacity individually
calculated, with the initial override of 1.0f as overcommit ratio.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-23 16:20:07 +02:00
Rafael Weingärtner 8b09620d77 CLOUDSTACK-10276: listVolumes not working when storage UUID is not a UUID (#2639)
When configuring a pre-setup primary storage we can enter the name-label of the storage that is going to be used by ACS and is already set up in the host. The problem is that we can use any String of characters there, and this String does not need to be a UUID. When listing volumes from a primary storage that has such conditions, the list will return all of the volumes in the cloud because the “API framework” will ignore that value as it is not a UUID type.
2018-05-22 17:02:40 +05:30
Gabriel Beims Bräscher 02ece53375 addNicToVirtualMachine: Fixes #2540 handle invalid MAC address arg (#2653)
Look for the next available MAC address if the given MAC address in command addNicToVirtualMachine is invalid (null, empty, blank). Fixes #2540
2018-05-21 16:24:21 +05:30
Rohit Yadav acc5fdcdbd
CLOUDSTACK-10290: allow config drives on primary storage for KVM (#2651)
This introduces a new global setting `vm.configdrive.primarypool.enabled` to toggle creation/hosting of config drive iso files on primary storage, the default will be false causing them to be hosted on secondary storage. The current support is limited from hypervisor resource side and in current implementation limited to `KVM` only. The next big change is that config drive is created at a temporary location by management server and shipped to either KVM or SSVM agent via cmd-answer pattern, the data of which is not logged in logs. This saves us from adding genisoimage dependency on cloudstack-agent pkg.

The APIs to reset ssh public key, password and user-data (via update VM API) requires that VM should be shutdown. Therefore, in the refactoring I removed the case of updation of existing ISO. If there are objections I'll re-put the strategy to detach+attach new config iso as a way of updation. In the refactored implementation, the folder name is changed to lower-cased configdrive. And during VM start, migration or shutdown/removal if primary storage is enable for use, the KVM agent will handle cleanup tasks otherwise SSVM agent will handle them.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-21 14:27:23 +05:30
Nicolas Vazquez 06f7e495dc Host Affinity plugin (#2630)
This implements a new host-affinity plugin.
2018-05-21 12:49:08 +05:30
Nicolas Vazquez 9aa1743984 registerIso: Fixes #2654 register iso in all zones (#2652)
Fix to register of iso in all zones. Fixes #2654.
2018-05-21 12:26:31 +05:30
Rafael Weingärtner b9ed42bd29
Fix primary storage count when deleting volumes (#2629)
* Primary Storage count for an account does not decrease when a Data Disk is deleted

When a data disk is created and not attached in a running VM, the "deleteVolume" will not decrement the count for used primary storage in the VMs accounting information. The property that is not being decremented is called "primarystoragetotal"; this information can be retrieved via "listAccounts" API method.

Steps to reproduce this issue:
1 - Create an account, deploy a VM in it
2 - Check the primary storage count for the account with listAccounts API
3 - Create a data disk
4 - Check the primary storage count for the account with listAccounts API
5 - Delete the Data disk
6 - Check the primary storage count for the account with listAccounts API - It is the same as before deleting the data disk (it should not be the same as the value in step 2!)

* formatting and cleanups

* fix imports that were wrongly changed during rebase
2018-05-16 15:28:28 -03:00
Rohit Yadav f663b926c7
config-drive: use hostname of VM instance of internal VM id (#2645)
This fixes config drive to use VM's user provided host-name instead of
the internal VM instance ID for hostname related config in both
cloudstack and openstack metadata bundled in the ISO.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-16 13:19:21 +05:30
Rohit Yadav 1b3046e376
CLOUDSTACK-9184: Fixes #2631 VMware dvs portgroup autogrowth (#2634)
* CLOUDSTACK-9184: Fixes #2631 VMware dvs portgroup autogrowth

This deprecates the vmware.ports.per.dvportgroup global setting.

The vSphere Auto Expand feature (introduced in vSphere 5.0) will take
care of dynamically increasing/decreasing the dvPorts when running out
of distributed ports . But in case of vSphere 4.1/4.0 (If used), as this
feature is not there, the new default value (=> 8) have an impact in the
existing deployments. Action item for vSphere 4.1/4.0: Admin should
modify the global configuration setting "vmware.ports.per.dvportgroup"
from 8 to any number based on their environment because the proposal
default value of 8 would be very less without auto expand feature in
general. The current default value of 256 may not need immediate
modification after deployment, but 8 would be very less which means
admin need to update immediately after upgrade.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-11 22:16:13 +05:30
Rohit Yadav a77ed56b86
CLOUDSTACK-9114: Reduce VR downtime during network restart (#2508)
This introduces a rolling restart of VRs when networks are restarted
with cleanup option for isolated and VPC networks. A make redundant option is
shown for isolated networks now in UI.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-11 12:48:07 +05:30
Nicolas Vazquez bd89760108 config-drive: support user data on L2 networks (#2615)
Supporting ConfigDrive user data on L2 networks.
Add UI checkbox to create L2 network offering with config drive.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-09 21:33:11 +05:30
Rohit Yadav 253f7d7728
listostypes: Fixes #2529 return boolean than string in response (#2632)
This returns the boolean value of the `isuserdefined` key than
converting it to string. Fixes #2529.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-09 18:03:09 +05:30