CLOUDSTACK-6975: Prevent dnsmasq from starting on backup redundant RvRRebase of PR #1509 against the 4.7 branch as requested by @swill
One LGTM from @ustcweizhou carried from previous PR. Previous PR will be closed.
Description from PR #1509:
CLOUDSTACK-6975 refers to service monitoring bringing up dnsmasq but this is no-longer accurate, as service monitoring is not active on the post-4.6 routers. These routers still suffer an essentially identical issue, however, because "dnsmasq needs to be restarted each time configure.py is called in order to avoid lease problems." As such, dnsmasq is still running on backup RvRs, causing the issues described in CLOUDSTACK-6975.
This PR is based on a patch submitted by @ustcweizhou. The code now checks the redundant state of the router before restarting dnsmasq.
RvR networks without this patch have dnsmasq running on both master and backup routers. RvR networks with this patch have dnsmasq running on only the master router.
* pr/1514:
CLOUDSTACK-6975: Prevent dnsmasq from starting on backup redundant RvR.
Signed-off-by: Will Stevens <williamstevens@gmail.com>
SystemVM cleanupsfrom the logrotate docs
> size - With this, the log file is rotated when the specified size is reached. Size may be specified in bytes (default), kilobytes (sizek), or megabytes (sizem).
> Note: If size and time interval options are specified at same time, only size option take effect. it causes log files to be rotated without regard for the last rotation time. If both log size and timestamp of a log file need to be considered by logrotate, the minsize option should be used. logrotate will rotate log file when they grow bigger than minsize, but not before the additionally specified time interval.
* pr/1414:
systemvm, logrotate: remove daily explicitly as it is ignored
Signed-off-by: Will Stevens <williamstevens@gmail.com>
* 4.7:
Fix Sync of template.properties in Swift
Configure rVPC for router.redundant.vrrp.interval advert_int setting
Have rVPCs use the router.redundant.vrrp.interval setting
Resolve conflict as forceencap is already in master
Split the cidr lists so we won't hit the iptables-resture limits
Check the existence of 'forceencap' parameter before use
Do not load previous firewall rules as we replace everyhing anyway
Wait for dnsmasq to finish restart
Remove duplicate spaces, and thus duplicate rules.
Restore iptables at once using iptables-restore instead of calling iptables numerous times
Add iptables copnversion script.
Reimplement router.redundant.vrrp.interval settingGlobal setting `router.redundant.vrrp.interval` is not used any more and it is now set to a hardcoded 1.
This results in a failover from master->backup when the backup doesn't hear from the master in ~3.6sec. This is a bit too tight, as we've seen failovers during live migrations. We could reproduce it in about half of the cases. Setting this to setting to 2 (tested it by hardcoding it in the systemvms) gives twice as much time and we didn't see issues any more. Instead of updating the hardcoded setting from 1 to 2, I reimplemented the global setting by sending it to the router with the cmd_line, as the non-VPC router also does.
Background:
Why is the maximum failover time in the example 3.6 seconds? This comes from the advertisement interval and the skew time. The default advertisement interval is 1 second (configurable in keepalived.conf). The skew time helps to keep everyone from trying to transition at once. It is a number between 0 and 1, based on the formula (256 - priority) / 256
As defined in the RFC, the backup must receive an advertisement from the master every (3 * advert_int) + skew_time seconds. If it doesn't hear anything from the master, it takes over. With a backup router priority of 100 (as in the example), the failover will happen at most 3.6 seconds after the master goes down.
Source: http://www.hollenback.net/KeepalivedForNetworkReliability
* pr/1486:
Configure rVPC for router.redundant.vrrp.interval advert_int setting
Have rVPCs use the router.redundant.vrrp.interval setting
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Restore iptables at once using iptables-restore instead of calling iptables numerous timesThis makes handling the firewall rules about 50-60 times faster because it is generated in memory and then loaded once. It's work by @borisroman see PR #1400. Reopened it here because I think this is a great improvement.
* pr/1482:
Resolve conflict as forceencap is already in master
Split the cidr lists so we won't hit the iptables-resture limits
Check the existence of 'forceencap' parameter before use
Do not load previous firewall rules as we replace everyhing anyway
Wait for dnsmasq to finish restart
Remove duplicate spaces, and thus duplicate rules.
Restore iptables at once using iptables-restore instead of calling iptables numerous times
Add iptables copnversion script.
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Honour GS use_ext_dns and redundant VR VIPThis patch addresses two issues:
On redundant VR setups, the primary resolver being handed out to instances is the guest_ip (primary IP for the VR). This might lead to problems upon failover, at least while the DHCP lease doesn't update (because the primary resolver will be checked first until times out, however it'll be gone upon failover).
If Global Setting use_ext_dns is true, we don't want the VR to be the primary resolver at all.
* pr/1536:
This patch addresses two issues:
Signed-off-by: Will Stevens <williamstevens@gmail.com>
* 4.8:
CLOUDSTACK-9287 - Improve test by checking if pvt gw is removed and fix typos
Handle private gateways more reliably
CLOUDSTACK-9287 - Fix RVR public interface
CLOUDSTACK-9287 - Add integration test to cover the private gateway related changes
CLOUDSTACK-9287 - Refactor the interface state configuration
CLOUDSTACK-9287 - Check if the nic profile has already been removed from a certain router
CLOUDSTACK-9287 - Bring up the private gw interface on state change to master
CLOUDSTACK-9287 - Make sure private gw interface is not used for default gw
CLOUDSTACK-9287 - Add integration test to cover the private gw interface/mac address issues
CLOUDSTACK-9287 - Put private gateway interface down on backup router
CLOUDSTACK-9287 - Generate new mac address if router is redundant and nic profile exists
Add private gateway IP to router initialization config
apply static routes on change to master state
* 4.7:
CLOUDSTACK-9287 - Improve test by checking if pvt gw is removed and fix typos
Handle private gateways more reliably
CLOUDSTACK-9287 - Fix RVR public interface
CLOUDSTACK-9287 - Add integration test to cover the private gateway related changes
CLOUDSTACK-9287 - Refactor the interface state configuration
CLOUDSTACK-9287 - Check if the nic profile has already been removed from a certain router
CLOUDSTACK-9287 - Bring up the private gw interface on state change to master
CLOUDSTACK-9287 - Make sure private gw interface is not used for default gw
CLOUDSTACK-9287 - Add integration test to cover the private gw interface/mac address issues
CLOUDSTACK-9287 - Put private gateway interface down on backup router
CLOUDSTACK-9287 - Generate new mac address if router is redundant and nic profile exists
Add private gateway IP to router initialization config
apply static routes on change to master state
Handle private gateways more reliablyWhen initialising a VPC router we need to know which IP/device corresponds to a private gateway. This is to solve a problem when stop/starting a VPC router (which gets the private gateway config as a guest network and as a result breaks the functionality). You read it right, the private gateway is sent as type=guest after reboot and type=public initially.
Before this change, you could add a private gw to a running router but you couldn't restart it (it would mix up the tiers). Now the private gateway is detected properly and it works just fine.
Booting without private gateway:
```
root@r-167-VM:~# cat /etc/cloudstack/cmdline.json
{
"config": {
"baremetalnotificationapikey": "V2l1u3wKJVan01h8kq63-5Y5Ia3VLEW1v_Z6i-31QIRJXlt5vkqaqf6DVcdK0jP3u79SW6X9pqJSLSwQP2c2Rw",
"baremetalnotificationsecuritykey": "OXI16srCrxFBi-xOtEwcYqwLlMfSFTlTg66YHtXBBqR7HNN1us3HP5zWOKxfVmz4a3C1kUNLPrUH13gNmZlu4w",
"disable_rp_filter": "true",
"dns1": "8.8.8.8",
"domain": "cs2cloud",
"eth0ip": "169.254.0.42",
"eth0mask": "255.255.0.0",
"host": "192.168.22.61",
"name": "r-167-VM",
"port": "8080",
"privategateway": "None",
"redundant_router": "false",
"template": "domP",
"type": "vpcrouter",
"vpccidr": "10.0.0.0/24"
},
"id": "cmdline"
```
Booting with private gateway:
```
root@r-167-VM:~# cat /etc/cloudstack/cmdline.json
{
"config": {
"baremetalnotificationapikey": "V2l1u3wKJVan01h8kq63-5Y5Ia3VLEW1v_Z6i-31QIRJXlt5vkqaqf6DVcdK0jP3u79SW6X9pqJSLSwQP2c2Rw",
"baremetalnotificationsecuritykey": "OXI16srCrxFBi-xOtEwcYqwLlMfSFTlTg66YHtXBBqR7HNN1us3HP5zWOKxfVmz4a3C1kUNLPrUH13gNmZlu4w",
"disable_rp_filter": "true",
"dns1": "8.8.8.8",
"domain": "cs2cloud",
"eth0ip": "169.254.2.227",
"eth0mask": "255.255.0.0",
"host": "192.168.22.61",
"name": "r-167-VM",
"port": "8080",
"privategateway": "10.201.10.1",
"redundant_router": "false",
"template": "domP",
"type": "vpcrouter",
"vpccidr": "10.0.0.0/24"
},
"id": "cmdline"
```
And:
```
cat cmdline
vpccidr=10.0.0.0/24 domain=cs2cloud dns1=8.8.8.8 privategateway=10.201.10.1 template=domP name=r-167-VM eth0ip=169.254.2.227 eth0mask=255.255.0.0 type=vpcrouter disable_rp_filter=true baremetalnotificationsecuritykey=OXI16srCrxFBi-xOtEwcYqwLlMfSFTlTg66YHtXBBqR7HNN1us3HP5zWOKxfVmz4a3C1kUNLPrUH13gNmZlu4w baremetalnotificationapikey=V2l1u3wKJVan01h8kq63-5Y5Ia3VLEW1v_Z6i-31QIRJXlt5vkqaqf6DVcdK0jP3u79SW6X9pqJSLSwQP2c2Rw host=192.168.22.61 port=8080
```
Logs:
```
2016-02-24 20:08:45,723 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-458d4c52 job-1402/job-1403 ctx-d5355fca) (logid:5772906c) Set privategateway field in cmd_line.json to 10.201.10.1
```
* pr/1474:
Handle private gateways more reliably
Add private gateway IP to router initialization config
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Apply static routes on change to master stateRefactored static routes for private gateways so they also get loaded when the router switches to master state. Otherwise they're lost and connections drop after fail over.
* pr/1472:
apply static routes on change to master state
Signed-off-by: Will Stevens <williamstevens@gmail.com>
CLOUDSTACK-9287 - Fix unique mac address per rVPC routerThis is work by @wilderrodrigues, see PR #1413 It contains important fixes and I think it needs to be included so I send the PR again.
* pr/1483:
CLOUDSTACK-9287 - Improve test by checking if pvt gw is removed and fix typos
CLOUDSTACK-9287 - Fix RVR public interface
CLOUDSTACK-9287 - Add integration test to cover the private gateway related changes
CLOUDSTACK-9287 - Refactor the interface state configuration
CLOUDSTACK-9287 - Check if the nic profile has already been removed from a certain router
CLOUDSTACK-9287 - Bring up the private gw interface on state change to master
CLOUDSTACK-9287 - Make sure private gw interface is not used for default gw
CLOUDSTACK-9287 - Add integration test to cover the private gw interface/mac address issues
CLOUDSTACK-9287 - Put private gateway interface down on backup router
CLOUDSTACK-9287 - Generate new mac address if router is redundant and nic profile exists
Signed-off-by: Will Stevens <williamstevens@gmail.com>
On redundant VR setups, the primary resolver being handed out to instances is the guest_ip (primary IP for the VR). This might lead to problems upon failover, at least while the DHCP lease doesn't update (because the primary resolver will be checked first until times out, however it'll be gone upon failover).
If Global Setting use_ext_dns is true, we don't want the VR to be the primary resolver at all.
CLOUDSTACK-9336 surround the execution of baremetal-vr.py with condition
* pr/1463:
CLOUDSTACK-9336 surround the execution of baremetal-vr.py with condition
Signed-off-by: Will Stevens <williamstevens@gmail.com>
If the size directive is used, logrotate will ignore the daily, weekly, monthly,
and yearly directives.
remove cloud-cleanup
This script does not do anything because it fails due missing /var/log/cloud directory. Logrotate is used for this functionality.
* 4.8:
CLOUDSTACK-9172 Added cross zones check to delete template and iso
Check the existence of 'forceencap' parameter before use
systemvm: set default umask 022 in injectkeys.sh
* 4.7:
CLOUDSTACK-9172 Added cross zones check to delete template and iso
Check the existence of 'forceencap' parameter before use
systemvm: set default umask 022 in injectkeys.sh
Check the existence of 'forceencap' parameter before useCheck the existence of 'forceencap' parameter before use.
Error seen:
```
Traceback (most recent call last):
File "/opt/cloud/bin/update_config.py", line 140, in <module>
process_file()
File "/opt/cloud/bin/update_config.py", line 54, in process_file
finish_config()
File "/opt/cloud/bin/update_config.py", line 44, in finish_config
returncode = configure.main(sys.argv)
File "/opt/cloud/bin/configure.py", line 1003, in main
vpns.process()
File "/opt/cloud/bin/configure.py", line 488, in process
self.configure_ipsec(self.dbag[vpn])
File "/opt/cloud/bin/configure.py", line 544, in configure_ipsec
file.addeq(" forceencaps=%s" % CsHelper.bool_to_yn(obj['encap']))
KeyError: 'encap'
```
* pr/1402:
Check the existence of 'forceencap' parameter before use
Signed-off-by: Will Stevens <williamstevens@gmail.com>
They might never appear.. for example when we have entries in
/etc/cloudstack/ips.json that haven't been plugged yet. Waiting
this long makes everything horribly slow (every vm, interface,
static route, etc, etc, will hit this wait, for every device).
* 4.8:
Display hostname the VPC router runs on
CLOUDSTACK-9266: Make deleting static routes in private gw work
CLOUDSTACK-9264: Make /32 static routes for private gw work
* 4.7:
Display hostname the VPC router runs on
CLOUDSTACK-9266: Make deleting static routes in private gw work
CLOUDSTACK-9264: Make /32 static routes for private gw work
CLOUDSTACK-9256 add unique key for static routes in jsonStatic routes that are being set do not show up in the static_routes.json file. The reason for this is that the index that is used, is the gateway address, which is not unique. Hence stuff is overwritten and lost.
Ping @borisroman @wilderrodrigues @DaanHoogland
* pr/1364:
CLOUDSTACK-9256 add unique key for static routes in json
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.7:
CLOUDSTACK-9254: Make longer names display pretty
CLOUDSTACK-9245 - Deletes ACL items when destroying the VPC or deleting the ACL itself
CLOUDSTACK-9245 - Formatting NetworkACLServiceImpl class
CLOUDSTACK-9245 - Formatting VpcManagerImpl class
CLOUDSTACK-9245 - Formatting NetworkACLManagerImpl class
More VR performance!
* 4.7:
Refactor public ip retrieval into method
CLOUDSTACK-9244 Fix setting up RFC1918 routes
CLOUDSTACK-9239 throw exception on deprecated command
Enhance VR performance by selectively executing tasks instead of brute-forcing
CLOUDSTACK-9236: Load Balancing Health Check button displayed when non-NetScaler offering is used
* 4.7:
CLOUDSTACK-9154 - Sets the pub interface down when all guest nets are gone
CLOUDSTACK-9187 - Makes code ready for more something like ethXXXX, if we ever get that far
CLOUDSTACK-9188 - Reads network GC interval and wait from configDao
CLOUDSTACK-9187 - Fixes interface allocation to VRRP instances
CLOUDSTACK-9187 - Adds test to cover multiple nics and nic removal
CLOUDSTACK-9154 - Adds test to cover nics state after GC
CLOUDSTACK-9154 - Returns the guest iterface that is marked as added
Conflicts:
engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java
[4.7] Critical VPCVR issues fixed: CLOUDSTACK-9154; CLOUDSTACK-9187; and CLOUDSTACK-9188This PR applies the same fixes as in the PR #1259, but against branch 4.7.
Please refer to PR #1259 for the tests results and all the comments already made there.
Issues fixed are:
* CLOUDSTACK-9154: rVPC doesn't recover from cleaning up of network garbage collector
* CLOUDSTACK-9187: rVPC routers in Master/Master due to concurrency problem when writing the keepalivd.conf
* CLOUDSTACK-9188: NetworkGarbageCollector is not using gc.interval and gc.wait from settings
Those changes have been covered by 2 new tests added to ```smoke/test_vpc_redundant.py```:
* test_04_rvpc_network_garbage_collector_nics
* test_05_rvpc_multi_tiers
The test ```test_04_rvpc_network_garbage_collector_nics``` depends on the global settings for the network.gc.interval and gc.wait. If one wants the test to run quicker, please change the settings (default is 600 seconds for each) and restart the Management Server before running the tests. I would suggest to set it to 60 seconds.
In addition, the NetworkGarbageCollector was redefining the settings above mentioned and not reading their values through ConfigDao. Due to that, the settings were not being applied properly and the test was waiting to long to check the VPC routers.
* pr/1277:
CLOUDSTACK-9154 - Sets the pub interface down when all guest nets are gone
CLOUDSTACK-9187 - Makes code ready for more something like ethXXXX, if we ever get that far
CLOUDSTACK-9188 - Reads network GC interval and wait from configDao
CLOUDSTACK-9187 - Fixes interface allocation to VRRP instances
CLOUDSTACK-9187 - Adds test to cover multiple nics and nic removal
CLOUDSTACK-9154 - Adds test to cover nics state after GC
CLOUDSTACK-9154 - Returns the guest iterface that is marked as added
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.7:
CLOUDSTACK-9222 Prevent cloud.log.1 filling up the disk
Add integration test for restartVPC with cleanup, and Private Gateway enabled.
Nullpointer Exception in NicProfileHelperImpl
CLOUDSTACK-9222 Prevent cloud.log.1 filling up the diskDelay Compress results in more space usage than needed. Since we have copy truncate we don't need it.
* pr/1329:
CLOUDSTACK-9222 Prevent cloud.log.1 filling up the disk
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.7:
Fix unable to setup more than one Site2Site VPN Connection
FIX S2S VPN rVPC: Check only redundant routers in state MASTER
PEP8 of integration/smoke/test_vpc_vpn
Add S2S VPN test for Redundant VPC
Make integration/smoke/test_vpc_vpn Hypervisor independant
FIX VPN: non-working ipsec commands
[UI] MADNESS
[DB] Add force_encap field to s2s_customer_gateway table
[ROUTER] Add forceencaps field to python router ipsec config method
[TEST] unittest needs rework
[MARVIN] Add forceencap field to VpnCustomerGateway class in marvin base
[CORE] Add Force UDP Encapsulation option to Site2Site VPN
CLOUDSTACK-9186: Root admin cannot see VPC created by Domain admin user
CLOUDSTACK-9192: UpdateVpnCustomerGateway is failing
CLOUDSTACK-6485 prevent ip asignment of private gw iface
CLOUDSTACK-9204 Do not error when staticroute is already gone
make both check lines consistent
CLOUDSTACK-9181 Prevent syntax error in checkrouter.sh
CLOUDSTACK-9202 Bump ssh timeout
[4.7] FIX Site2SiteVPN on redundant VPCThis PR:
- fixes the inability to setup more than one Site2Site VPN connection from a VPC
- fixes starting of Site2Site VPN on redundant VPC
- fixes Site2Site VPN state checking on redundant VPC
- improves the vpc_vpn test to allow multple hypervisors
- adds an integration test for Site2Site VPN on redundant VPC
Tested it on 4.7 single Xen server zone:
command:
```
nosetests --with-marvin --marvin-config=/data/shared/marvin/mct-zone1-xen1.cfg -a tags=advanced,required_hardware=true /tmp/test_vpc_vpn.py
```
results:
```
Test Site 2 Site VPN Across redundant VPCs ... === TestName: test_01_redundant_vpc_site2site_vpn | Status : SUCCESS ===
ok
Test Remote Access VPN in VPC ... === TestName: test_01_vpc_remote_access_vpn | Status : SUCCESS ===
ok
Test Site 2 Site VPN Across VPCs ... === TestName: test_01_vpc_site2site_vpn | Status : SUCCESS ===
ok
----------------------------------------------------------------------
Ran 3 tests in 1490.076s
OK
```
also performed numerous manual inspections of state of VPN connections and connectivity between VPC's
* pr/1276:
Fix unable to setup more than one Site2Site VPN Connection
FIX S2S VPN rVPC: Check only redundant routers in state MASTER
PEP8 of integration/smoke/test_vpc_vpn
Add S2S VPN test for Redundant VPC
Make integration/smoke/test_vpc_vpn Hypervisor independant
FIX VPN: non-working ipsec commands
Signed-off-by: Remi Bergsma <github@remi.nl>
CLOUDSTACK-9181 Prevent syntax error in checkrouter.shAdded quotes to prevent syntax errors in weird situations.
Error seen in mgt server:
```
2015-12-15 14:30:32,371 DEBUG [c.c.a.m.AgentManagerImpl] (RedundantRouterStatusMonitor-7:ctx-0dd8ef3e) Details from executing class com.cloud.agent.api.CheckRouterCommand: Status: UNKNOWN
/opt/cloud/bin/checkrouter.sh: line 28: [: =: unary operator expected
/opt/cloud/bin/checkrouter.sh: line 31: [: =: unary operator expected
```
Cause:
```
root@r-1191-VM:/opt/cloud/bin# ./checkrouter.sh
./checkrouter.sh: line 28: [: =: unary operator expected
./checkrouter.sh: line 31: [: =: unary operator expected
Status: UNKNOWN
```
Somehow a nic was missing.
After fix the script can handle this:
```
root@r-1191-VM:/opt/cloud/bin# ./checkrouter.sh
Status: UNKNOWN
```
The other states are also reported fine:
```
root@r-1191-VM:/opt/cloud/bin# ./checkrouter.sh
Status: MASTER
```
```
root@r-1192-VM:/opt/cloud/bin# ./checkrouter.sh
Status: BACKUP
```
While at it, I also removed the INTERFACES variable/constant as it was only used once and hardcoded the second time. Now both are hardcoded and easier to read.
* pr/1296:
make both check lines consistent
CLOUDSTACK-9181 Prevent syntax error in checkrouter.sh
Signed-off-by: Remi Bergsma <github@remi.nl>
CLOUDSTACK-9204 Do not error when staticroute is already goneWhen deleting a static route fails because it isn't there any more (KeyError), it should succeed instead.
Error seen:
```
[INFO] Processing JSON file static_routes.json.1451560145
Traceback (most recent call last):
File "/opt/cloud/bin/update_config.py", line 140, in <module>
process_file()
File "/opt/cloud/bin/update_config.py", line 52, in process_file
qf.load(None)
File "/opt/cloud/bin/merge.py", line 258, in load
proc = updateDataBag(self)
File "/opt/cloud/bin/merge.py", line 91, in _init_
self.process()
File "/opt/cloud/bin/merge.py", line 131, in process
dbag = self.process_staticroutes(self.db.getDataBag())
File "/opt/cloud/bin/merge.py", line 179, in process_staticroutes
return cs_staticroutes.merge(dbag, self.qFile.data)
File "/opt/cloud/bin/cs_staticroutes.py", line 26, in merge
del dbag[key]
KeyError: u'192.168.0.3'
```
* pr/1298:
CLOUDSTACK-9204 Do not error when staticroute is already gone
Signed-off-by: Remi Bergsma <github@remi.nl>
- Refactors the set_backup, set_master and set_fault methods to have better names for the variable
- Increase the sleep on the test in order to wait for the routers to be ready. It's now 3 times the GC settings
CLOUDSTACK-9155 make sure logrotate is effective for cloud.logMany processes on the VRs log to cloud.log. When log rotate kicks in, the file is rotated but the scripts still write to the old inode (cloud.log.1 after rotate). Tis quickly fills up the tiny log partition.
Using 'copytruncate' is a small tradeoff, there is a slight change of missing a log entry, but in the old situation nothing ended up in cloud.log after rotate (except for stuff that was (re)started) so I think this is the best solution until we properly rewrite the script to either use their own script or syslog.
More details: https://issues.apache.org/jira/browse/CLOUDSTACK-9155
* pr/1235:
CLOUDSTACK-9155 make sure logrotate is effective
Signed-off-by: Remi Bergsma <github@remi.nl>
Many processes on the VRs log to cloud.log. When logrotate
kicks in, the file is rotated but the scripts still write
to the old inode (cloud.log.1 after rotate). Tis quickly
fills up the tiny log partition.
Using 'copytruncate' is a tradeoff, there is a slight
change of missing a log entry, but in the old situation
we were missing all of them after logrotate.
CLOUDSTACK-9151 - As a Developer I want the VRID to be set within the limits of KeepaliveDThis PR fixes a blocker issue!
- Just like with RVRs, use the VRID 51 instead of making it dependent on the VPCID
- Reason: arbitary unique number 0..255 used to differentiate multiple instances of vrrpd running on the same NIC (and hence same socket). virtual_router_id 51
* pr/1231:
CLOUDSTACK-9151 - Removes the replacement of the VRID in the CsRedundant file
Signed-off-by: Remi Bergsma <github@remi.nl>
Updating pom.xml version numbers for release 4.6.2-SNAPSHOTSet next version in 4.6 release branch to version 4.6.2-SNAPSHOT.
Using ` ./tools/build/setnextversion.sh`.
Ping @bhaisaab @DaanHoogland before we merge this, how will we be creating the upgrade paths from 4.6.2 to 4.7? After this PR is merged, we need to manually do a fwd-merge and make sure we keep the pom versions in master/4.7. Much like in #1071.
* pr/1186:
Fixed typo in iam/pom.xml
Updating pom.xml version numbers for release 4.6.2-SNAPSHOT
Signed-off-by: Daan Hoogland <daan@onecht.net>
- Just like with RVRs, use the VRID 51 instead of making it dependent on the VPCID
- Reason: arbitary unique number 0..255 used to differentiate multiple instances of vrrpd running on the same NIC (and hence same socket). virtual_router_id 51
Setup general route for RFC 1918 space, as otherwise it will be sent to
the public gateway and not work. More specific routes that may be set
have preference over this generic routes.
When public network is RFC1918, we do not setup the routes to avoid
problems with internal-only deployments.
* 4.6:
CLOUDSTACK-9106 - Makes Enum name compliant with Java code conventions.
CLOUDSTACK-9106 - Adds a test to cover the changes in the applyVpnUsers() method
CLOUDSTACK-9106 - Makes the router commands call more consistent.
CLOUDSTACK-9106 - Enables private gateway tests on Redundant VPCs
CLOUDSTACK-9106 - Refactor the createPrivateNicProfileForGateway() method
CLOUDSTACK-9106 - Reduces the amount of iterations through the routers of a VPC
Add support for not (re)starting server after cloud-setup-management.
Closed PRs that will not be considered for merge:
This closes#1158
This closes#1097
- Use the router to retrieve the instance ID
- Check if the VPC is redundant in order to reuse the private gateway address.
- Brings the private gateways interfaces up.
CLOUDSTACK-9105: Logging enhancement: Handle/reference to track API calls end to end in the MS logs
Added logid to logging framework, now all API call logs can be tracked with this id end to end
* pr/1167:
CLOUDSTACK-9105: Logging enhancement: Handle/reference to track API calls end to end in the MS logs Added logid to logging framework, now all API call logs can be tracked with this id end to end
Signed-off-by: Daan Hoogland <daan@onecht.net>
Send arping to the gateway instead of our own addressWe need to send an Unsolicited ARP to the gateway, instead of our own address. We now encounter problems when people deploy/destroy/deploy and get the same public ip.
Packets arrive, but with incorrect / cached mac and are ignored by the routervm kernel.
Run arping manually to update the arp-cache on the gateway and things start to work.
Then we discovered the `arping` is actually done, but sent to its own address. Therefore the gateway doesn't pick it up. We only saw this happening when rapid deploy tools are used, like Terraform that do deploy/destroy/deploy and might get the same ip but on a new router having a new mac.
```
2015-12-03 18:07:25,589 CsHelper.py execute:160 Executing: arping -c 1 -I eth1 -A -U -s 192.168.23.8 192.168.23.1
```
The integration tests seem happy, although the full run is still ongoing:
```
=== TestName: test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Status : SUCCESS ===
```
Thanks @sspans for helping trouble shoot this. Ping @wilderrodrigues can you review please?
* pr/1163:
CLOUDSTACK-9097 Make public ip work immediately
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.6:
CLOUDSTACK-9075 - Uses the same vlan since it should have been already released
CLOUDSTACK-9075 - Adds VPC static routes test
CLOUDSTACK-9075 - Covers Private GW ACL with Redundant VPCs
CLOUDSTACK-9075 - Add method to get list of Physical Networks per zone
CLOUDSTACK-6276 Removing unused parameter in integration test for projects
CLOUDSTACK-6276 Removing unused parameter in integration test
CLOUDSTACK-6276 Fixing affinity groups for projects
We need to send an Unsolicited ARP to the gateway, instead of our own address. We now encounter problems when people deploy/destroy/deploy and get the same public ip.
CLOUDSTACK-9062: Improve S3 implementation.The S3 implementation is far from finished, this commit focuses on the bases.
- Upgrade AWS SDK to latest version.
- Rewrite S3 Template downloader.
- Rewrite S3Utils utility class.
- Improve addImageStoreS3 API command.
- Split various classes for convenience.
- Various minor improvements and code optimizations.
A side effect of the new AWS SDK is that it, by default, uses the V4 signature. Therefore I added an option to specify the Signer, so it stays compatible with previous versions.
Please review thoroughly, both code inspection and (automated) integration tests. Currently no integration tests are available specifically for S3. Therefore the implementation is needed to be tested manually, for now...
What I tested:
- Greenfield install -> will download latest systemvm template automatically to S3.
- Upload a template/iso
- Download a template/iso
- Restart of management server -> list available templates -> doesn't download them again if available.
* pr/1083:
CLOUDSTACK-9062: Improve S3 implementation.
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.6:
CLOUDSTACK-9015 - Delete public IP in order to get both IP and NAT rule removed.
CLOUDSTACK-9015 - Add test to cover the rVPC routers stop/start/reboot scenario
CLOUDSTACK-9015 - Make sure the Backup router can talk to the Master router after a stop/start/reboot
CLOUDSTACK-9067 - As I developer I want to remove all the unused router-shell scripts from ACSThis PR removes the unused shell scripts that were present in the ACS project. Those script were replaced by the.
Some of the scripts are used by the HyperV Resource, which were hardcoded. I took the opportunity to use the Java constants over there as well, so the next one touching the code will know they exist and won't hardcode anything.
The following task were applied:
* Remove the shell files and the Java constants that were mapping them;
* Apply the use of the Java constants to the HyperV Resource class;
* Wrap the String.format() method in the StringUtils so we can test the changes in the HyperV Resource class.
The last point was added because I do not have a HyperV test environment. Hence, I wanted to make sure the tiny code I changed is covered at least by unit tests.
* pr/1084:
CLOUDSTACK-9067 - Replaces hardcoded paths with the VRScripts constants.
CLOUDSTACK-9067 - Fomatting the code of HypervDirectConnectResource class
CLOUDSTACK-9067 - Remove old script file from the project
Signed-off-by: Remi Bergsma <github@remi.nl>
[4.6.1] CLOUDSTACK-9015 - Redundant VPC Virtual Router's state is BACKUP & BACKUP or MASTER & MASTERThis PR closes#1064
All the details can be found in the original PR, which won't be merged because it was created agains master. Once this PR is closed, the original one will be also closed.
* pr/1070:
CLOUDSTACK-9015 - Delete public IP in order to get both IP and NAT rule removed.
CLOUDSTACK-9015 - Add test to cover the rVPC routers stop/start/reboot scenario
CLOUDSTACK-9015 - Make sure the Backup router can talk to the Master router after a stop/start/reboot
Signed-off-by: Remi Bergsma <github@remi.nl>
The S3 implementation is far from finished, this commit focusses on the bases.
- Upgrade AWS SDK to latest version.
- Rewrite S3 Template downloader.
- Rewrite S3Utils utility class.
- Improve addImageStoreS3 API command.
- Split various classes for convenience.
- Various minor improvements and code optimalisations.
A side effect of the new AWS SDK is that it, by default, uses the V4 signature. Therefore I added an option to specify the Signer, so it stays compatible with previous versions.
CLOUDSTACK-9058 - Respond with "saved_password" if no password is to be issued.The password server on the virtual router should respond with "saved_password" if no password is to be issued. This allows for backwards compatibility with Windows Guest VMs which require the "saved_password" response.
* pr/1079:
CLOUDSTACK-9058
Signed-off-by: Remi Bergsma <github@remi.nl>
- Stop KeepaliveD/ConntrackD if the eth2 (guest) interface is not configured and UP
- Only setup the redundancy after all the router configuration is done
- Open the FW for the VRRP communitation
- 224.0.0.18 and 225.0.0.50
- Set keepalived.conf.templ by default to use interface eth2 (guest)
- It will be reconfigured anyway, but having eth2 there is more clear
CLOUDSTACK-8993: DHCP fails with "no address available" when an IP is reused
Repopulate /etc/dhcphosts.txt to remove old entries with the same IP address.
* pr/981:
CLOUDSTACK-8993: DHCP fails with "no address available" when an IP is reused
Signed-off-by: Remi Bergsma <github@remi.nl>
- If we stop/start a router, the state in the file will still say MASTER, when it is actually not
- Checking the state based on the interface (eth1) state
- Once master.py is called by keepalived, save the state in the json file to BACKUP just to make sure it's also written there
- Do not use the API call because it will read what is in the database, that might not have been updated yet
* Check the status in the router directly instead
- Remove all the sleeps
- It was working before because the Routers were restarting about 10 times for each operation
e.g. adding a VM to a network ot acquiring a new IP.
- Adding stat_rules of internal LB to iptables
We needed one extra rule in the INPUT chain
- With the keepalived fixed they should not be needed anymore. So first reducing them drasticaly
- I am now making a backup of the template file, write to the template file and compare it with the existing configuration
- The template file is recovered afer the process
- I also check if the process is running
- I fixed a bug in the compare method
- I am now updating the configuration variable once the file content is flushed to disk