Commit Graph

31464 Commits

Author SHA1 Message Date
Rohit Yadav 03e52ac317
systemvm: remove unknown default ssh public key and fix centos6 unary issue (#73)
This fixes a unary comparison issue on older bash (centos6) and removes unknown default ssh public key.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-23 13:21:28 +05:30
Anurag Awasthi b24dc2027c FIX4: Allow ready volumes to be attached to VMs that have never been started once (#71)
With this patch apache/cloudstack@b766bf7 we started tracking disks in attaching state so that other attach request can fail gracefully. However this missed the case where disks were in allocated state but attach was requested.

For the use case where users want to attach disk in allocated state but not ready, we need to have allocated-attaching transition as well. We must take care of returning to the original state - allocated or ready - when attach request has completed.

For the use case of unstarted vm's the disk must proceed as follows - "Allocated" -> Attaching -> Allocated. When VM is started, the disk is "created" and pool is assigned. For the use case of started VMs it's more trivial and disk proceeds as follows - Ready -> Attaching -> Ready.

Test this by creating a VM with "startvm=false", create a disk and try attaching it in allocated state. It would give an exception on latest 4.11 but will be fixed on this patch.
2019-04-23 13:10:09 +05:30
Rohit Yadav 6a82134fe9
FIX5: new qemu-guest-agent based patching for KVM (#72)
This introduces a new patching script for patching systemvms on KVM
using qemu-guest-agent that runs inside the systemvm on startup. This
also removes the vport device which was previously used by the legacy
patching script and instead uses the modern and new uniform guest
agent vport for host-guest communication.

Also updates the sytemvmtemplate build config to use the latest Debian
9.8.0 iso.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-22 16:30:14 +05:30
Rohit Yadav 3e5b9e20be packaging: systemctl daemon-reload after agent install or upgrade (#3269)
This runs systemctl daemon-reload after cloudstack-agent is installed
or upgraded.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 96611fc640)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:18:39 +05:30
Rohit Yadav 8dcdb23cd6 packaging: don't skip unit tests while building packages (#3266)
This may slow down CI and release, but ensures that unit tests always
run as part of the packaging build process.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 55efaf14d9)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:18:33 +05:30
Rohit Yadav 58feb8aa6b client: don't disable TLSv1, TLSv1.1 by default that breaks VMware env
This fixes the issue that TLSv1 and TLSv1.1 are still used by CloudStack
management server to communicate with VMware vCenter server. With the
current defaults, the setup/deployment on VMware fails. Users/admins
can however setup the security file according to their env needs to
disable TLSv1 and TLSv1.1 for server sockets (8250/agent service for
example).

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 6e51bde228)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:18:28 +05:30
Rohit Yadav e7d8730836 systemd: Fix -Dpid arg passing to systemd usage service (#3210)
* systemd: Fix -Dpid arg passing to systemd usage service

This fixes regression introduced by refactoring PR #3163 where `-Dpid`
was incorrectly passed string `$$` instead of parent PID integer.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* fix systemd limitation, exec using /bin/sh instead and wrap in ${} syntax

https://www.freedesktop.org/software/systemd/man/systemd.service.html#Command%20lines

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* usage: don't hide exception from Gabriel's https://github.com/apache/cloudstack/pull/3207/files#diff-062fcf5ae32de59dfd6cd4f780e1d7cd

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit f7327c7457)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:18:21 +05:30
Rohit Yadav 19172d4da4 systemd: fix services to allow TLS configurations via java.security.ciphers (#3163)
* systemd: fix services to allow TLS configurations via java.security.ciphers

This fixes the management server and systemd services to allow the
java.security.ciphers file to configure disabled TLS protocols and
algorithms. This also cleans up systemd service files for agent and
usage server.

This fixes #3140

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* configure: fix travis failure due pycodestyle error

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit cb3fed0e4e)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:17:23 +05:30
Rohit Yadav ffee81c884 packaging: management default file cleanup (#3139)
This cleanups management server default file, the `cloud.jks` is no
longer created by the management server but instead created in-memory
by the root CA plugin on management server startup.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 463372bc7e)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:16:03 +05:30
Rohit Yadav 6f247323cc scripts: don't restart systemvm cloud.service post cert import (#3134)
This ensures that the systemvm agent (cloud.service) is not restarted
on certificate import. The agent has an inbuilt logic to attempt reconnection.
If the old certificates/keystore is invalid agent will attempt reconnection.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 53ec27cd7a)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:15:58 +05:30
Nicolas Vazquez 8195f0f488 server: Prevent corner case for infinite PrepareForMaintenance (#3095)
A corner case was found on 4.11.2 for #2493 leading to an infinite loop in state PrepareForMaintenance

To prevent such cases, in which failed migrations are detected but still running on the host, this feature adds a new cluster setting host.maintenance.retries which is the number of retries before marking the host as ErrorInMaintenance if migration errors persist.

How Has This Been Tested?
- 2 KVM hosts, pick one which has running VMs as H
- Block migrations ports on H to simulate failures on migrations:
iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 49152:49215 -m comment --comment 'test block migrations' iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 16509 -m comment --comment 'test block migrations
- Put host H in Maintenance
- Observe that host is indefinitely in PrepareForMaintenance state (after this fix it goes into ErrorInMaintenance after retrying host.maintenance.retries times)

(cherry picked from commit 13c81a8ee4)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:15:52 +05:30
Rohit Yadav db6cf82d8c security: increase keystore setup/import timeout (#3076)
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 89c567add8)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-04-16 22:15:47 +05:30
Paul Angus 5aae410dfc Updating pom.xml version numbers for release 4.11.2.0
Signed-off-by: Paul Angus <paul.angus@shapeblue.com>
2018-11-13 09:24:27 +00:00
Rohit Yadav 4d8e75cec8
systemvmtemplate: update debian 9.6 iso url and checksum (#3022)
This fixes the failing systemvmtemplate build due to 404 on old ISO url.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-11-13 06:30:09 +05:30
Bitworks LLC f6e600e4d8 CLOUDSTACK-3009: Fix resource calculation CPU, RAM for accounts. (#3012)
The view "service_offering_view" doesn't include removed SOs, as a result when SO is removed, the bug happens. The PR introduces a change for resource calculation changing "service_offering_view" to "service_offering" table which has all service offerings.

Must be fixed in:

4.12
4.11
Fixes: #3009
2018-11-13 06:29:08 +05:30
Paul Angus f95aec4a84
Merge pull request #3018 from shapeblue/fixrouterfilecreation
Prevent error on GroupAnswers on VR creation
2018-11-12 13:25:08 +00:00
Nicolas Vazquez bb7493ad4b configdrive: Add missing ConfigDrive entries on existing zones after upgrade (#3007)
After upgrade existing environments to 4.11, ConfigDrive cannot be enabled for existing zones due to missing entry on 'physical_network_service_providers' table.
2018-11-12 11:30:00 +05:30
nvazquez dea0b3eb78 Prevent error on GroupAnswers on VR creation 2018-11-09 15:30:57 -03:00
Nicolas Vazquez 7d8eb37924 [4.11] Fix set initial reservation on public IP ranges (#2980)
* Fix initial reservation on public IP ranges

* Do not allow dedicating a system VM IP range
2018-11-07 10:48:07 -02:00
Nicolas Vazquez af0c1e48cf Fix DirectNetworkGuru canHandle checks for lowercase isolation methods (#3010) 2018-11-07 09:53:01 -02:00
Rohit Yadav c6e53f6cc6
kvm: reset KVM host on heartbeat failure (#2984)
On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-30 15:13:59 +05:30
Rene Diepstraten f7bc5807a3 vr: defer was broken in VR because of json name change
Committed at f0491d5c72 (#2979).

After upgrade from CS 4.10 to CS 4.11, multiple VRs did not start through.
It did not properly defer the finalize config in update_config.py.
Apparently, the json files are now called differently: where it used to
be vm_dhcp_entry.json it now has a uuid added, for example
vm_metadata.json.4d727b6e-2b48-49df-81c3-b8532f3d6745.
The if statement that checks if the finalize can be safely deferred
therefore no longer matches. This PR contains a fix so finalize is
defered again.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-29 16:19:33 +05:30
Nicolas Vazquez dffb430975 kvm: Fix migrating VM from ISO failures (#2928)
Prevents errors while migrating VM from ISO:

Test 1: Deploy VM from ISO -> Live migrate VM to another host -> ERROR
Test 2: Register ISO using Direct Download on KVM -> Deploy VM from ISO -> Live migrate VM to another host -> ERROR

- Prevent NullPointerException migrating VM from ISO
- Prevent mount secondary storage on ISO direct downloads on KVM
2018-10-29 16:14:20 +05:30
Rohit Yadav f0491d5c72
vr: defer was broken in VR because of json name change (#2979)
After upgrade from CS 4.10 to CS 4.11, multiple VRs did not start through.
It did not properly defer the finalize config in update_config.py.
Apparently, the json files are now called differently: where it used to
be vm_dhcp_entry.json it now has a uuid added, for example
vm_metadata.json.4d727b6e-2b48-49df-81c3-b8532f3d6745.
The if statement that checks if the finalize can be safely deferred
therefore no longer matches. This PR contains a fix so finalize is
defered again.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-29 16:11:43 +05:30
Rohit Yadav e2ba934c19
server: fix unwanted txn commit warning messages (#2927)
This fixes unwanted transaction commit warning messages such:

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-29 02:49:54 +05:30
alexanderbazhenoff a87acf93d8 kvm: improved performance on creating VM (#2923)
Improved performance on creating VM for KVM virtualization.

On a huge hosts every "ifconfig | grep" takes a lot of time (about 2.5-3 minutes on hosts with 500 machines). For example: ip link show dev $vlanDev > /dev/null is faster than ifconfig |grep -w $vlanDev > /dev/null. But using ip command is much better. Using this patch you can create 500s machine in 10 seconds. You don't need slow ifconfig prints anymore.
2018-10-25 16:28:13 +05:30
Rohit Yadav 9cf57d2568
network: on rolling restart force stop old routers (#2926)
This force stops old VRs when performing rolling restart with
cleanup=true. This will ensure that VRs are powered off quickly than
wait longer for the normal ACPI shutdown. During testing, it was found
on VMware where VM stops are slow compared to XenServer and KVM.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-25 09:20:39 +05:30
Rohit Yadav 9b35b64b3c
packaging: install plugins at /usr/share/cloudstack-management/lib (#2915)
Install any additional plugin jars in the lib directory to be picked up
by the classpath builder, otherwise one has to manually add the jar
to /etc/default/cloudstack-management after installation. This fixes
the issue for `mysql-ha` plugin.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-24 18:21:03 +05:30
Rohit Yadav e092529c98
systemvm: Ensure cloud service reboots after failure (#2916)
This fixes an issue for systemvms (CPVM and SSVM) on VMware, as eth0
is not programmed (link-local) the networking.service fails to start
which is a dependency for cloud-postinit service. When cloud-postinit
service fails to start/run, it fails to start the agent (cloud) process.
This fixes the smoketest failures we saw in case of VMware 6.5 with
4.11.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-23 23:33:08 +05:30
Rohit Yadav 47c9c1cb58
client: mgmt server listen default to 0.0.0.0 (#2907)
This makes the management server listen on all interfaces by default.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-22 20:00:51 +05:30
Nicolas Vazquez 5cf163d888 server: Unify templates/ISOs checksum API output (#2911)
Unify checksum API output for templates and ISOs: not list the checksum algorithm on:
KVM direct downloads

On in progress normal template downloads. The algorithm is shown on the listtemplates API, but after it is downloaded it is not shown anymore.
2018-10-21 22:33:04 +05:30
Rohit Yadav 5ce14df31f
network: Allow ability to disable rolling restart feature (#2900)
This adds a global setting for admins who may not want the rolling
restart of routers or are seeing any issues around it. In future, this
setting may be removed.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-17 20:27:08 +05:30
Rohit Yadav 1904a70512
agent: on shutdown don't allow server reconnection (#2904)
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).

This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-17 06:31:13 +05:30
Nicolas Vazquez 9003c7bfdc Add checksum sanity validation on template registration (#2902)
* Add checksum sanity validation on template registration

* Refactor

* Rename checksum sanity method
2018-10-16 10:21:20 -03:00
Nicolas Vazquez 11d83fab43 agent: set log level to INFO as default for http wire (#2903)
Avoid logging bytes on direct download on KVM.
2018-10-16 10:32:03 +05:30
Rohit Yadav 933ee23104
vr: memory and swap optimizations (#2892)
This tries to provide a threshold based fix for #2873 where swappinness of VR is not used until last resort. By limiting swappiness unless actually needed, the VR system degradation can be avoided for most cases. The other change is around not starting baremetal-vr by default on all VRs, according to the spec https://cwiki.apache.org/confluence/display/CLOUDSTACK/Baremetal+Advanced+Networking+Support only vmware VRs need to run it and that too only as the last step of the setup/completion, so we don't need to run it all the time.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-16 10:29:48 +05:30
Rohit Yadav 63f4d852d5 PULL_REQUEST_TEMPLATE: simplify and remove unpopular sections (#2876)
This removes the section from the pull request template that is not very
popular or filled by the PR author.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-15 15:21:13 -03:00
Rohit Yadav ea771cfda4
router: Fixes #2719 program VR nics by device id order for VPC (#2888)
This fixes #2719 where private gateway IP might be incorrectly
programmed on a guest network nic. The VR would now check ipassoc
requests by mac addresses than provided nic/device id in case they are
wrong.

The root cause is that the device id information is lost when aggregated
commands are created upon starting of a new VPC VR, without the correct
device id in ip_associations json it mis-programs the VR.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-10 15:20:36 +05:30
Frank Maximus a6196b0a60 Fixes: #2881 Improve Exception message (#2889)
Network.Service and Network.Provider were missing a toString() method.
Added this so appending (a list of) them will be understandable.
2018-10-09 15:43:48 +05:30
Paul Angus 37ecfe2d28
Merge pull request #2884 from shapeblue/usage-server-timstamp
add date to usage server logs

Merged based on 2x LGTM and checking errors in smoke tests - none in any way related to the logging output change.
2018-10-08 15:20:51 -04:00
Rohit Yadav f430f41edd
ca: Fixes #2877 mgmt server cert should have all addrs of default nic (#2879)
This fixes the default RootCA provider implementation to initiate
and issue certificate for mgmt server on startup for all the IP addresses
on the default nic of that host.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-07 21:07:10 +05:30
Paul Angus 35656553ca add date to usage server logs 2018-10-06 17:20:17 +01:00
Simon Weller 5db65a6363 kvm: Fixes #2868 libvirt resize notify failure (#2878)
Incorrect diskpath information was being sent to virsh blockresize, so the block device size was never refreshed to reflect the new disk size.
Fixes #2868
2018-10-05 18:35:09 +05:30
Rohit Yadav 0c943ab1f0
CertUtils: export private key to pem format correctly (#2875)
This makes openssl rsa -in <file> -check pass, due to "RSA" string the
validate of private key (pem file) by openssl fails. Also removes
a commented import.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-05 04:45:47 +05:30
René Moser 8c0b9d6202 systemvm: baremetal-vr: reduce memory usage (#2866)
We see a suspicious continuous increase in memory usage. Kind of looks like a memory leak.

One thing noted during debugging is that flask is started in debug mode. This is not best practice for a production system.
2018-10-03 16:38:32 +05:30
Paul Angus fe10e684f9
Merge pull request #2743 from nuagenetworks/bugfix/marvin_config_drive
CLOUDSTACK-10380: Fix startvm giving another pw after pw reset
2018-09-26 10:21:52 -04:00
Rohit Yadav c2f4b3653d
packaging: Fixes #2857 don't overwrite agent logrotate config (#2860)
This makes the agent logrotate config to `noreplace` so on upgrade
any changes to the file are not lost.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-09-25 11:25:06 -04:00
Rohit Yadav 6f1c5551fc
agent: Fixes #2858 agent LB not working (#2859)
This fixes the issue that on reconnection, agent LB feature will fail
and only the first ms-host will be tried reconnection again and again.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-09-22 14:40:18 +05:30
Frank Maximus cca25055fa Handle review comments 2018-09-21 14:01:35 +02:00
Rohit Yadav 70dbfa7883
systemvm: export $TYPE before patching ssvm/cpvm (#2855)
This fixes a regression introduced in #2799, by exporting $TYPE
before the `patch` is called to patch/extract archives for ssvm/cpvm.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-09-21 14:19:18 +05:30