The reason is:
1. In redundant router, we won't enable eth2(public network interface) until
keepalived determine the router is MASTER.
2. ipassoc.sh normally kick in before keepalived process running. And it would
set eth2's IP address using "ip addr add $dev $ip"
3. "ip addr add $dev $ip" won't add mask for the device, then there is no way to
update broadcast address for eth2. Then broadcast address is 0.0.0.0.
4. As long as "ip addr add $dev $ip" executed, later executed "ifconfig $dev $ip
netmask $mask" won't calculated the broadcast address from $ip and $mask.
To fix this, we enable and configure eth2 temporaily when cloud-early-config
executed, then disable eth2 interface. By this way, broadcast address of should
be calculated and set correctly.
status 11083: resolved fixed
commit e4fe14a9ce19fbbdb15bbfaad586d80031ca9fbc break redundant router, because
at time of ping, the network is not up for redundant router.
Add timout for ping
The issue happened quite rare, but indeed can show.
And when the issue happen, the status of redundant router would be "Status:
FAULT".
It's due to ipassoc.sh wasn't executed before the system bring eth2 up and go to
master mode, then eth2 wasn't configured correctly. Then "ip route add default
xx" can't complete.
This commit should fixes the issue.
Sometime when keepalived start up(during system boot up period), it would fail
to(likely due to unable to receive the packet), and think itself is the only
router, then make itself master.
Add 10 seconds delay after start up to work around the issue.
This patch enable redundant virtual routers.
1. To enable this feature, db need to be updated using follow SQL by now(we
would get a UI way later):
UPDATE network_offerings SET redundant_router=1 WHERE guest_type="Virtual" AND
system_only=0;
2. System would try to start up two routers at different hosts. But if there is
only one host in the zone, system would start up two routers on it.
3. The failover part is using keepalived, and connection tracking part is using
conntrackd. There would be one master router and one backup router. The status
of router(master or backup) can be query from the database table domain_router
now. Management server would update the status every 30s by default.
4. The routers for the same zone would use same external NIC(same ip and mac).
The script used for fail-over would ensure only one external NIC present in the
network at any time.
5. Currently management server don't got the ability to stop one of router is
both of them reported as master. The feature is in the todo list.
After two routers start up, disconnect anyone of them, the guest network
shouldn't be affected, and established connection(http, ssh, etc.) should still
works. The fail-over on gateway part should be 3~4 seconds.
Currently the patch works with KVM. Would deal with vmware and XenServer soon.