But still keep customized iptables because the newer version of iptables would
result in very big range of package upgrading due to dependence relationship.
Also add newer version of "virt-what" from debian testing, otherwise it would
provide wrong information for xen-domU(reported as xen-dom0 in previous version
for 2.6.39). This one have no dependency issue and can be added easily.
status 11056: resolved fixed
reviewed-by: Abhi
Summary of Changes:
while adding a primary address to the domR interface, previous primaray addresses(ip) are removed and added as with 32-bit netmask.
This is to avoid two same ip's with different netmask attached to the interface.
The following are summary of changes:
1) when network.disable.rpfilter is set to true, then rp_filter will be disadbled(set to 0) on all the public interfaces and also default setting of the system.
2) when network.disable.rpfilter is set to false, then rp_filter will be enabled(set to 1) on all the public interfaces and also default setting of the system.
3) here public public interface means , eth2 ... ethN. default setting means (/proc/sys/net/ipv4/conf/default/rp_filter).
4) Default setting change will have impact on non-public interface. Due to these, rp_filter is always enabled on Non-public interfaces(eth0,eth1 and lo).
5) when a new public interface is created, new interface will take rp_filter value from the default setting.
The following are summary of changes:
1) when network.disable.rpfilter is set to true, then rp_filter will be disadbled(set to 0) on all the public interfaces and also default setting of the system.
2) when network.disable.rpfilter is set to false, then rp_filter will be enabled(set to 1) on all the public interfaces and also default setting of the system.
2) here public public interface means , eth2 ... ethN. default setting means (/proc/sys/net/ipv4/conf/default/rp_filter).
3) Default setting change will have impact on non-public interface.if there is no specific setting for other interfaces in /etc/sysctl.conf or otherexplict setting , they will follow this default settings. currently non-public interface like eth0 ,eth1 does not have any specific setting in sysctl.conf, due to this there rp_filters will be changed when ever network.disable.rpfilter setting is changed.
4) default setting is required to changes beacuse when a new public interface is created, new interface will take rp_filter value from the default setting.
Because currently the lock in the script is retried every 1 second, and it's a
quite a long time that it's possible for some other active script can be
executed and retain the lock again. So it's possible that the first one request
the lock is always being preemptted by others, then finally got timeout.
To fix this issue, the retry interval is reduced to 0.1 seconds, which would
provide more retry times. And each process want to get the lock would create a
file named lockname-PID.lock, and only the first one(judged by timestamp) would
get the lock. The remaining ones would retry every 0.1 seconds to see if it can
get the lock.
Also timeout time is extended to 30 seconds.
And add testcase for it.
status 11772: resolved fixed
To solve password file is destroyed along with restartNetwork command issue. If
the password is not set in fact, user can use "ResetPassword" to try again. But
it won't happen mostly, because it's only possible if the restartNetwork
happened between user start up VM and set the new password.
Reviewed-by: Keshav
status 11518: resolved fixed
Added New value "link-local" to global config network.loadbalancer.haproxy.stats.visibility . With this change it can take new parameter "link-local" value apart from the existing 3 values global,guest-network,disabled.
global - stats visible from public network
guest-network - stats visible only to guestnetwork.
link-local - stats visible only to link local network
disabled - stats disabled.
Added global config to enable/disable rp_filter for domR.
previous commit: d966906374d4a0cb8fa57326a1f7625c871f64fd
Test Case-1 :
1) Set network.disable.rpfilter global config to true
2) Restart the domR
3) check the settings reflected in proc filesystem
- for public interface like eth2,eth3 : /proc/sys/net/ipv4/conf/eth2/rp_filter should have 0 , and rest other interfaces should have value of 1
Test Case-2 :
1) set network.disable.rpfilter global config to false
2) Restart the domR
3) check the settings reflected in proc filesystem
- for public interface like eth2,eth3 : /proc/sys/net/ipv4/conf/eth2/rp_filter should have 1 , and rest other interfaces should also have value of 1
This message may show during redundant router start up:
FAULT (Restarting DNS forwarder and DHCP server: dnsmasq failed!)
This caused by edithost.sh is racy with keepalived process. They both want to
restart dnsmasq.
Even in normal condition, it's very hard to reproduce this bug. Add file lock
for edithost.sh should solve it.
This message may show during redundant router start up:
FAULT (Restarting DNS forwarder and DHCP server: dnsmasq failed!)
This caused by edithost.sh is racy with keepalived process. They both want to
restart dnsmasq.
Even in normal condition, it's very hard to reproduce this bug. Add file lock
for edithost.sh should solve it.
The reason is:
1. In redundant router, we won't enable eth2(public network interface) until
keepalived determine the router is MASTER.
2. ipassoc.sh normally kick in before keepalived process running. And it would
set eth2's IP address using "ip addr add $dev $ip"
3. "ip addr add $dev $ip" won't add mask for the device, then there is no way to
update broadcast address for eth2. Then broadcast address is 0.0.0.0.
4. As long as "ip addr add $dev $ip" executed, later executed "ifconfig $dev $ip
netmask $mask" won't calculated the broadcast address from $ip and $mask.
To fix this, we enable and configure eth2 temporaily when cloud-early-config
executed, then disable eth2 interface. By this way, broadcast address of should
be calculated and set correctly.
status 11083: resolved fixed
The reason is:
1. In redundant router, we won't enable eth2(public network interface) until
keepalived determine the router is MASTER.
2. ipassoc.sh normally kick in before keepalived process running. And it would
set eth2's IP address using "ip addr add $dev $ip"
3. "ip addr add $dev $ip" won't add mask for the device, then there is no way to
update broadcast address for eth2. Then broadcast address is 0.0.0.0.
4. As long as "ip addr add $dev $ip" executed, later executed "ifconfig $dev $ip
netmask $mask" won't calculated the broadcast address from $ip and $mask.
To fix this, we enable and configure eth2 temporaily when cloud-early-config
executed, then disable eth2 interface. By this way, broadcast address of should
be calculated and set correctly.
status 11083: resolved fixed
commit e4fe14a9ce19fbbdb15bbfaad586d80031ca9fbc break redundant router, because
at time of ping, the network is not up for redundant router.
Add timout for ping
The issue happened quite rare, but indeed can show.
And when the issue happen, the status of redundant router would be "Status:
FAULT".
It's due to ipassoc.sh wasn't executed before the system bring eth2 up and go to
master mode, then eth2 wasn't configured correctly. Then "ip route add default
xx" can't complete.
This commit should fixes the issue.
The issue happened quite rare, but indeed can show.
And when the issue happen, the status of redundant router would be "Status:
FAULT".
It's due to ipassoc.sh wasn't executed before the system bring eth2 up and go to
master mode, then eth2 wasn't configured correctly. Then "ip route add default
xx" can't complete.
This commit should fixes the issue.
commit e4fe14a9ce19fbbdb15bbfaad586d80031ca9fbc break redundant router, because
at time of ping, the network is not up for redundant router.
Add timout for ping
Cahnges:
1) putting back the changes(bug 10800 and 10557) that had been reverted during merging of Elb/nectarine.
2) 10800 Upgrade from previous release also added: Upgrade from Previous release will leave iptable rules in the INPUT ipchain, this is fixed.
Part 1
This backport contained:
commit 52317c718c25111c2535657139b541db0c9d1e1f
bug 9154: Initial check in for enabling redundant virtual router
commit 54199112055d754371bfb141168fb5538bf6d6ea
Add host verification for CheckRouterCommand
commit cef978a228c90056ead9be10cbc4de74c2b8de76
Fix CheckRouterAnswer's isMaster report
commit 4072f0a6991ac3b63601a1764fbe14188965f62f
Some build fixes and code refactoring for redundant router
commit 4d3350b7cd8ee2706a9bace4437fc194e36c8dd5
Redundant Router: Fix OVS
commit 6a228830e7c46d819fa0c3317e159e041337e887
Fix findByNetwork()/findByNetworkAndPod()'s return
commit c627777b3d5bdbcd60db4032cebd349a5b1ecd83
Redundant Router: Fix isVmAlive()
commit e1275d2514adc41f8744f5107d4069c38be195f1
Only issue CheckRouterCommand to redundant routers
And all modification to the scripts till
commit 4e3942462ed3fde3a3d7011e95839e2128fba514
logging changes
in the master branch.
Sometime when keepalived start up(during system boot up period), it would fail
to(likely due to unable to receive the packet), and think itself is the only
router, then make itself master.
Add 10 seconds delay after start up to work around the issue.
This patch enable redundant virtual routers.
1. To enable this feature, db need to be updated using follow SQL by now(we
would get a UI way later):
UPDATE network_offerings SET redundant_router=1 WHERE guest_type="Virtual" AND
system_only=0;
2. System would try to start up two routers at different hosts. But if there is
only one host in the zone, system would start up two routers on it.
3. The failover part is using keepalived, and connection tracking part is using
conntrackd. There would be one master router and one backup router. The status
of router(master or backup) can be query from the database table domain_router
now. Management server would update the status every 30s by default.
4. The routers for the same zone would use same external NIC(same ip and mac).
The script used for fail-over would ensure only one external NIC present in the
network at any time.
5. Currently management server don't got the ability to stop one of router is
both of them reported as master. The feature is in the todo list.
After two routers start up, disconnect anyone of them, the guest network
shouldn't be affected, and established connection(http, ssh, etc.) should still
works. The fail-over on gateway part should be 3~4 seconds.
Currently the patch works with KVM. Would deal with vmware and XenServer soon.
haproxy tunning:
0. Test case:
httpd running in 5 user VMs, all of them created on a xenserver host(16 core, 42G memroy, 10G network)
domR running on an anther host with same hardware configuration.
test application, ab, running on anther host behind an anther seperate switch
1.haproxy is not a memory intensive app. I can get 4625.96 connection/s with 1G memory. While it's really a CPU intensive app, domR always uses around 100% CPU on the host.
2.By default, you can't get better connection/s rate, because ip_conntrack_max and tw_bucket are too small, you will see the error in domR like:
"TCP: time wait bucket table overflow" or "nf_conntrack: table full, dropping packet".
So I increase these numbers to 1000000 from 65536, then I can steadly get around 4600 connection/s when memory is >= 1G.
Here is the connection per second, tested by "ab -n 1000000 -c 100 http://192.168.170.152:880/test.html"
domR memory conn/s
128M: 3545.55
256M: 4081.38
512M: 4318.18
1G: 4625.96
7G: 4745.53
3. If I enable notrack for both connections between domr/user vm, and public network, that tell iptable in domR don't track the connection during my test, then I can get better number, around
5800 connections/s. But we can't enable notrack, as iptables is used to track throughput in domR.
4. In a word, with this commit, the connection rate of haproxy can be increased from 1000-2000/s to 4700/s when domR's memory is larger than 1G.
5. How many CPU need to assign to domR to get this number? Haven't finished yet, as CPU is shared by all the VMs on the host, if other VMs are busy, it will impact the performance of haproxy.