Commit Graph

376 Commits

Author SHA1 Message Date
John Bampton 7d23a0a759
Fix spelling (#6272) 2022-07-05 09:08:53 +02:00
nvazquez 96594aec28
Merge branch '4.16' 2022-05-23 08:16:52 -03:00
Nicolas Vazquez dc975dff95
[KVM] Enable IOURING only when it is available on the host (#6399)
* [KVM] Disable IOURING by default on agents

* Refactor

* Remove agent property for iouring

* Restore property

* Refactor suse check and enable on ubuntu by default

* Refactor irrespective of guest OS

* Improvement

* Logs and new path

* Refactor condition to enable iouring

* Improve condition

* Refactor property check

* Improvement

* Doc comment

* Extend comment

* Move method

* Add log
2022-05-23 08:11:14 -03:00
Wei Zhou 8f39a049bb
agent: enable ssl only for kvm agent (not in system vms) (#6371)
* agent: enable ssl only for kvm agent (not in system vms)

* Revert "agent: enable ssl only for kvm agent (not in system vms)"

This reverts commit b2d76bad2e.

* Revert "KVM: Enable SSL if keystore exists (#6200)"

This reverts commit 4525f8c8e7.

* KVM: Enable SSL if keystore exists in LibvirtComputingResource.java
2022-05-12 07:01:55 -03:00
Wei Zhou 4525f8c8e7
KVM: Enable SSL if keystore exists (#6200)
* KVM: Enable SSL if keystore exists

* Update #6200: add logs if no passphrase or no keystore
2022-04-22 11:51:21 -03:00
Pearl Dsilva 830f3061bc
SystemVM optimizations (#5831)
* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency

* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp

* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup

* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes

* Add ssh to k8s nodes details in the Access tab on the UI

* test

* Refactor ca/cert patching logic

* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script

* remove all references of systemvm.iso

* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs

* fix script timeout

* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand

* remove commented code + change core user to cloud for cks nodes

* Update ownership of ssh directory

* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)

* Add UI changes + move changes from patch file to runcmd

* test: validate performance for template modification during seeding

* create vms folder in cloudstack-commons directory - debian rules

* remove logic for on the fly template convert + update k8s test

* fix syntax issue - causing issue with shared network tests

* Code cleanup

* refactor patching logic - certs

* move logic of fixing rootdiskcontroller from upgrade to kubernetes service

* add livepatch option to restart network & vpc

* smooth upgrade of cks clusters

* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency

* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp

* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup

* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes

* Add ssh to k8s nodes details in the Access tab on the UI

* test

* Refactor ca/cert patching logic

* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script

* remove all references of systemvm.iso

* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs

* fix script timeout

* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand

* remove commented code + change core user to cloud for cks nodes

* Update ownership of ssh directory

* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)

* Add UI changes + move changes from patch file to runcmd

* test: validate performance for template modification during seeding

* create vms folder in cloudstack-commons directory - debian rules

* remove logic for on the fly template convert + update k8s test

* fix syntax issue - causing issue with shared network tests

* Code cleanup

* add cgroup config for containerd

* add systemd config for kubelet

* add additional info during image registry config

* address comments

* add temp links of download.cloudstack.org

* address part of the comments

* address comments

* update containerd config - as version has upgraded to 1.5 from 1.4.12 in 4.17.0

* address comments - simplify

* fix vue3 related icon changes

* allow network commands when router template version is lower but is patched

* add internal LB to the list of routers to be patched on network restart with live patch

* add unit tests for API param validations and new helper utilities - file scp & checksum validations

* perform patching only for non-user i.e., system VMs

* add test to validate params

* remove unused import

* add column to domain_router to display software version and support networkrestart with livePatch from router view

* Requires upgrade column to consider package (cloud-scripts) checksum to identify if true/false

* use router software version instead of checksum

* show N/A if no software version reported i.e., in upgraded envs

* fix deb failure

* update pom to official links of systemVM template
2022-04-21 13:40:19 -03:00
nvazquez 65dc2df896
Merge branch '4.16' 2022-04-13 08:45:48 -03:00
slavkap 42a92dcdd3
Extract the IO_URING configuration into the agent.properties (#6253)
When using advanced virtualization the IO Driver is not supported. The
admin will decide if want to enable/disable this configuration from
agent.properties file. The default value is true
2022-04-13 08:43:35 -03:00
John Bampton 15937369fe
Fix spelling (#6161)
* Fix spelling

* Fix spelling
2022-03-29 00:21:07 -03:00
John Bampton 6401c850b7
Fix spelling (#6064)
* Fix spelling

- `interupted` to `interrupted`
- `paramter` to `parameter`

* Fix more typos
2022-03-08 13:02:35 -03:00
Suresh Kumar Anaparti b50542a11c
Merge branch '4.16' into main 2022-02-15 19:26:04 +05:30
Pearl Dsilva e0a5df50ce
CKS Enhancements and SystemVM template upgrade improvements (#5863)
* This PR/commit comprises of the following:
- Support to fallback on the older systemVM template in case of no change in template across ACS versions
- Update core user to cloud in CKS
- Display details of accessing CKS nodes in the UI - K8s Access tab
- Update systemvm template from debian 11 to debian 11.2
- Update letsencrypt cert
- Remove docker dependency as from ACS 4.16 onward k8s has deprecated support for docker - use containerd as container runtime

* support for private registry - containerd

* Enable updating template type (only) for system owned templates via UI

* edit indents

* Address comments and move cmd from patch file to cloud-init runcmd

* temporary change

* update k8s test to use k8s version 1.21.5 (instead of 1.21.3 - due to https://github.com/kubernetes/kubernetes/pull/104530)

* support for private registry - containerd

* Enable updating template type (only) for system owned templates via UI

* smooth upgrade of cks clusters

* update pom file with temp download.cloudstack.org testing links

* fix pom

* add cgroup config for containerd

* add systemd config for kubelet

* add additional info during image registry config

* update to official links
2022-02-15 18:27:14 +05:30
Daniel Augusto Veronezi Salvador b4aabadc4d
Replace string libraries with org.apache.commons.lang3.StringUtils (#5386)
* Replace google lib for lang3 and adjust methods calls

* Replace string libs by lang3

* Prohibit others string libs

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-11-18 13:41:48 +05:30
Daniel Augusto Veronezi Salvador 159c72fa97
Externalize KVM Agent's option to change migration thread timeout (#4570)
* Externalize KVM Agent's option to change migration thread timeout

* Update javadoc

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-08-26 08:18:17 -03:00
Daniel Augusto Veronezi Salvador 83c0b61ab2
Externalize KVM Agent storage's reboot configuration (#4586)
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-08-24 01:06:00 -03:00
Daniel Augusto Veronezi Salvador 349120f7c5
Externalize config to enable manually setting CPU topology on KVM VM (#5273)
* Externalize config to enable manually setting CPU topology on KVM VM

* Change log level

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-08-15 23:52:50 -03:00
Daniel Augusto Veronezi Salvador 7b752c3077
Externalize KVM Agent storage's timeout configuration (#5239)
* Externalize KVM Agent storage's timeout configuration

* Address @nvazquez review

* Add empty line at the end of the agent.properties file

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-28 15:45:27 +02:00
dahn 6f93e5cd08
Revert "Externalize kvm agent storage timeout configuration (#4585)" (#5218)
This reverts commit 05a978c249.
2021-07-20 09:16:43 +02:00
Daniel Augusto Veronezi Salvador 05a978c249
Externalize kvm agent storage timeout configuration (#4585)
* Externalize KVM Agent storage's timeout configuration

Created a class of constant agent's properties available to configure on "agent.properties".
Created a class to provides a facility to read the agent's properties file and get its properties.

* Refactored KVHAMonitor nested thread and changed some logs

* It has been added the timeout's config in the agent.properties file

* Rename classes

* Rename var and remove comment

* Fix typo with word "heartbeat"

* Extract multiple methods call to variables

* Add unit tests to file handler

* Increase info about the property

* Create inner class Property

* Rename method getProperty to getPropertyValue

* Remove copyright

* Remove copyright

* Extract code to createHeartBeatCommand

* Change method access from protected to private

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-19 09:41:50 +02:00
Wei Zhou df4103f0d1
novnc: Add source IP check (#4736)
* novnc: Add client IP check for novnc console in cloudstack 4.16

* novnc ip check : Fix restart CPVM or mgt server does not update novnc param

* novnc ip check: move to method
2021-03-06 15:08:34 +05:30
Rakesh 6c88e9afb3
Dont add host back after agent service restart (#4228) 2020-10-14 16:49:39 +00:00
davidjumani 3872bf1ff9
kvm: Enable PVLAN support on L2 networks (#4040)
This is an extention of #3732 for kvm.
This is restricted to ovs > 2.9.2
Since Xen uses ovs 2.6, pvlan is unsupported.
This also fixes issues of vms on the same pvlan unable to communicate if they're on the same host
2020-08-20 15:46:34 +05:30
Spaceman1984 b586eb22f1
Human readable sizes in logs (#4207)
This PR adds outputting human readable byte sizes in the management server logs, agent logs, and usage records. A non-dynamic global variable is added (display.human.readable.sizes) to control switching this feature on and off. This setting is sent to the agent on connection and is only read from the database when the management server is started up. The setting is kept in memory by the use of a static field on the NumbersUtil class and is available throughout the codebase.

Instead of seeing things like:
2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"106496","bytesReceived":"0","result":"true","details":"","wait":"0",}}] }

The KB MB and GB values will be printed out:

2020-07-23 15:31:58,593 DEBUG [c.c.a.t.Request] (AgentManager-Handler-12:null) (logid:) Seq 8-1863645820801253428: Processing: { Ans: , MgmtId: 52238089807, via: 8, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-224-VM","bytesSent":"(104.00 KB) 106496","bytesReceived":"(0 bytes) 0","result":"true","details":"","wait":"0",}}] }

FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Human+Readable+Byte+sizes
2020-08-13 15:55:16 +05:30
harikrishna-patnala a279d5c453
logging: Logging framework to use only log4j (#4003)
Currently CloudStack is using logging frameworks as log4j and Java util logging, logging wrappers as slf4j and Apache common logging.
Here changes are to made it uniform, using only log4j framework.
Removed Java util logging, slf4j and Apache common logging.
2020-06-17 07:11:23 +05:30
Nicolas Vazquez 73122fd0a9
[KVM] Direct download agnostic of the storage provider (#3828)
* Remove constraint for NFS storage

* Add new property on agent.properties

* Add free disk space on the host prior template download

* Add unit tests for the free space check

* Fix free space check - retrieve avaiable size in bytes

* Update default location for direct download

* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope

* Verify location for temporary download exists before checking free space

* In progress - refactor and extension

* Refactor and fix

* Last fixes and marvin tests

* Remove unused test file

* Improve logging

* Change default path for direct download

* Fix upload certificate

* Fix ISO failure after retry

* Fix metalink filename mismatch error

* Fix iso direct download

* Fix for direct download ISOs on local storage and shared mount point

* Last fix iso

* Fix VM migration with ISO

* Refactor volume migration to remove secondary storage intermediate

* Fix simulator issue
2020-03-06 19:56:54 +01:00
Abhishek Kumar 0ad2370baf
Enable Direct Download for System VMs (#3731)
* changes for configurable timeouts for direct download

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* server: refactor direct download config value retrieval

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactored direc download cmd, downloader classes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* server, services: allow direct download template for SSVM, CPVM

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* list bypassed system templates

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* ignore direct download template during system tempalte download

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add direct download entry while adding store

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix previous change, donot add multiple entries for direct download

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* connection request timeout as hidden configuration

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix template zone ref cleanup on zone deletion

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix previous commit test error, change implementation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactored zone template cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2020-02-26 13:38:31 +01:00
Rohit Yadav d90341ebf1
cloudstack: add JDK11 support (#3601)
This adds support for JDK11 in CloudStack 4.14+:

- Fixes code to build against JDK11
- Bump to Debian 9 systemvmtemplate with openjdk-11
- Fix Travis to run smoketests against openjdk-11
- Use maven provided jdk11 compatible mysql-connector-java
- Remove old agent init.d scripts

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-02-12 12:58:25 +05:30
nvazquez bea627a52e Merge branch '4.11' into 4.12 2019-06-04 09:06:09 -03:00
Nicolas Vazquez 12c850ed2f
KVM: Improvements on upload direct download certificates (#2995)
* Improvements on upload direct download certificates

* Move upload direct download certificate logic to KVM plugin

* Extend unit test certificate expiration days

* Add marvin tests and command to revoke certificates

* Review comments

* Do not include revoke certificates API
2019-06-04 03:08:31 -03:00
Daan Hoogland 29918e25e3 Merge release branch 4.11 to 4.12
* 4.11:
  KVM: Fix agents dont reconnect post maintenance (#3239)
2019-05-23 14:29:41 +02:00
Nicolas Vazquez e86f671c8e KVM: Fix agents dont reconnect post maintenance (#3239)
* Keep connection alive when on maintenance

* Refactor cancel maintenance and unit tests

* Add marvin tests

* Refactor

* Changing the way we get ssh credentials

* Add check on SSH restart and improve marvin tests
2019-05-23 14:13:17 +02:00
Rohit Yadav 52f68a273a Merge remote-tracking branch 'origin/4.11'
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-12-04 16:39:21 +05:30
Rohit Yadav 89c567add8
security: increase keystore setup/import timeout (#3076)
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-12-04 01:28:24 +05:30
Rohit Yadav e871638dc8 Merge remote-tracking branch 'origin/4.11' 2018-10-17 13:25:06 +05:30
Rohit Yadav 1904a70512
agent: on shutdown don't allow server reconnection (#2904)
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).

This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-17 06:31:13 +05:30
Rohit Yadav 1f9811db8d Merge remote-tracking branch 'origin/4.11' 2018-09-24 20:05:08 +05:30
Rohit Yadav 6f1c5551fc
agent: Fixes #2858 agent LB not working (#2859)
This fixes the issue that on reconnection, agent LB feature will fail
and only the first ms-host will be tried reconnection again and again.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-09-22 14:40:18 +05:30
Rohit Yadav 1d132d0e58 Merge branch '4.11' 2018-06-08 13:45:31 +05:30
Rohit Yadav 779649f5ee
agent: Avoid sudo, renew certificates assuming root (#2697)
In some environments running the keystore cert renewal (as root user)
over an already connected agent connection may cause exception
such as: `sudo: sorry, you must have a tty to run sudo`. Since, all
agents - KVM, CPVM and SSVM run as root user, we don't need to run
the renewal scripts with sudo.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-06-08 13:07:34 +05:30
Rohit Yadav 4661daa1dd Merge branch '4.11': Fixes #2633 don't block agent for pending tasks on reconnection (#2638)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-16 15:36:10 +05:30
Rohit Yadav d893fb5b00
agent: Fixes #2633 don't wait for pending tasks on reconnection (#2638)
When agent loses connection with management server, the reconnection
logic waits for any pending tasks to finish. However, when such tasks
do finish they fail to send an `Answer` back to managements server.
Therefore from a management server's perspective such pending
operations are stuck in a FSM state and need manual removal or fixing.
This is by design where management server's side cmd-answer request
pattern is code/execution dependent, therefore even if the answer
were to be sent when management server came back up (reconnects)
the management server will fail to acknowledge and process the answer
due to missing listeners or being in the exact state to handle answers.

Historically, the Agent would wait to reconnect until the internal
tasks complete but I found no reason why it should wait for reconnection
at all.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-16 15:35:00 +05:30
Rohit Yadav 644b0910cd Merge branch '4.11'
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-04-20 00:46:43 +05:30
Rohit Yadav 8da2462469
CLOUDSTACK-10333: Secure Live VM Migration for KVM (#2505)
This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.

Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.

Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.

The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:

    listen_tcp=0
    listen_tls=1
    tcp_port="16509"
    tls_port="16514"
    auth_tcp="none"
    auth_tls="none"
    key_file = "/etc/pki/libvirt/private/serverkey.pem"
    cert_file = "/etc/pki/libvirt/servercert.pem"
    ca_file = "/etc/pki/CA/cacert.pem"

For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.

There are no API or DB changes.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-04-20 00:36:18 +05:30
lzh3636 fed3492b57 CLOUDSTACK-10357: Improve log messages in methods (#2580)
Fix several logs that mismatch method.
Add stacktraces for throw new statements.
2018-04-20 00:33:27 +05:30
nvazquez 1c99fd7388 Merge branch '4.11' 2018-03-21 08:12:59 -03:00
Nicolas Vazquez 6a75423779 CLOUDSTACK-10231: Asserted fixes for Direct Download on KVM (#2408)
Several fixes addressed:

- Dettach ISO fails when trying to detach a direct download ISO
- Fix for metalink support on SSVM agents (this closes CLOUDSTACK-10238)
- Reinstall VM from bypassed registered template (this closes CLOUDSTACK-10250)
- Fix upload certificate error message even though operation was successful
- Fix metalink download, checksum retry logic and metalink SSVM downloader
2018-03-20 19:24:46 +05:30
Rohit Yadav 8ef131745a Merge branch '4.11' 2018-03-15 16:46:50 +05:30
Rohit Yadav 30175d6879
CLOUDSTACK-10132: Extend support for management servers LB for agents (#2469)
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.

This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).

This FR introduces two new global settings:

- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
  for the agent's background task that checks and switches to agent's
  preferred host.

The indirect.agent.lb.algorithm supports following algorithm options:

- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
  host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).

Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.

Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
  'indirect.agent.lb.algorithm' global settings.

On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.

First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.

From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.

Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.

Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-03-15 16:34:03 +05:30
Rafael Weingärtner d0ec2611f7 Forward merge #2454 merged on '4.11' branch
[CLOUDSTACK-10283] Sudo to setup agent keystore, fail on host add.
2018-02-22 19:47:47 -03:00
Rohit Yadav f1cf5f97e9 CLOUDSTACK-10283: Sudo to setup agent keystore, fail on host add failure
This would make keystore utility scripts being executed as sudoer
in case the process uid/owner is not root but still a sudoer user.

Also fails addHost while securing a KVM host and if keystore fails to be
setup for any reason.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-02-14 13:08:20 +01:00