Failover Internet Connection - Waikato Linux Users Group

Note: You are viewing an old revision of this page. View the current version.

I'll describe a simple way of setting up a redundant internet connection.

My setup involves my main server being the 'router' for the network. I have another machine, elsewhere on the network that has a radio card in it. All traffic is directed to the main server, which then routes it on the the radio machine, which does the NAT. I also have another connection - a so-called "Internet Hub" which takes ethernet in and has a serial line out, connected to a modem. This does dial on demand and NAT.

Radio machine IP: 10.0.0.254 (name radio) Internet Hub IP: 10.0.0.253 (name modem) Server IP: 10.0.0.1

So I can change which internet connection I use simply by changing the default route on the server. What I do, is set up my default gateway as being radio, with a metric of 0

route add default gw radio metric 0

I then add the backup route in as well, with a higher metric

route add default gw radio metric 10

The higher metric means it is used less preferentially. As I don't have load balancing enabled, it doesn't get used at all.

So, If I discover that the radio connection has gone down, I can manually failover by issuing

route del default gw

When it comes back up, I can reinstate it as the default link by doing

route add default gw radio metric 0

The simplest way of knowing if the link has gone down is to ping a remote host. However, I need to make sure I am pinging said host via the radio link, or else when I drop the default route, the ping will succeed via the backup route. So I add a route to the next-hop after the radio link.

route add ip.of.next.hop gw radio

I can then ping ip.of.next.hop - if it succeeds, the radio link is up. If it fails, the radio link is down.

I can automate this a number of ways. One way I finalized on was a cron script that is run every minute

:/usr/local/sbin# cat check-internet.pl

/usr/bin/perl

$primary="radio"; $backup="modem"; $target="next-hop"; $emailaddr="daniel";

$state_file="/var/state/route-checks"; $tolerance="3";

$retval=system("ping $target -i 1 -c 5 > /dev/null"); $retval = $retval / 256; $cur_state=`cat $state_file`;

if ($retval > 0) {

$next_state = $cur_state + 1;

if ($cur_state < $tolerance) {

system("echo $next_state > $state_file");

} elsif ($cur_state == $tolerance) {

system("echo $next_state > $state_file"); system("route del default gw $primary"); system("echo \"Droppinog default gw ($primary)\" | mail -s \"Route Change (Down)\" $emailaddr");

} elsif ($cur_state > $tolerance ) {

do nothing, already dropped the route!

}

} else {

$next_state = 0;

if ($cur_state == 0) {

do nothing

} elsif ($cur_state <= $tolerance) {

reset the counter

system("echo $next_state > $state_file");

} else {

reset counter and bring up default route;

system("echo $next_state > $state_file"); system("route add default gw $primary"); system("echo \"Restoring default gw ($primary)\" | mail -s \"Route Change (UP)\" $emailaddr");

}

}

Ignoring the sloppy coding (the empty if or else {} pairs were left intentionally, as I was adding logging stuff to the script and wanting to do some other stuff with it.

What this does, is pings the nexthop 5 times. If ping returns 0, its fine. If it returns 256 or higher, there was 100% packet loss. or > 80%. Or something. I can't remember exactly - go read the ping source code. Every time we get a 256 retval, we increment the counter stored in /var/state/route-checks. When this counter gets to the threshold value, we increment the counter again, and drop the default route. If we continue to get bad results, we ignore them

If we get a 0 retval, and the counter is less or equal to the tolerance level, we reset the counter 0. If its above the tolerance level, it means we've dropped the default route, so we reset the counter, and bring the default route back up

Thats pretty simple, and it works. Although with a tolerance of 3, as shown, it takes 4 minutes or so to work out the link is down and to bring it back up. Cron can only schedule tasks every minute, so thats the smallest accuracy you'll get

Another method is to use nagios. I have a eventhandler that I wrote for nagios, but I never got it working nicely, hence writing my own above. I can provide it as a sample, however I think the issues were due to the link going to a hard critical state straight away, and not being in a warning state for any length of time.

One page links to FailoverInternetConnection:

UserSubmittedNotes

The following authors of this page have not agreed to the WlugWikiLicense. As such copyright to all content on this page is retained by the original authors.

EeroVolotinen

The following authors of this page have agreed to the WlugWikiLicense.

Version 1, saved on Thursday, February 13, 2003 11:47:01 am by DanielLawson

Edit PageHistory Diff Info LikePages

/usr/bin/perl