| Rev | Author | # | Line |
|---|---|---|---|
| 1 | DanielLawson | 1 | I'll describe a simple way of setting up a redundant internet connection. |
| 2 | |||
| 3 | My setup involves my main server being the 'router' for the network. I have another machine, elsewhere on the network that has a radio card in it. All traffic is directed to the main server, which then routes it on the the radio machine, which does the [NAT]. I also have another connection - a so-called "Internet Hub" which takes ethernet in and has a serial line out, connected to a modem. This does dial on demand and NAT. | ||
| 4 | |||
| 5 | Radio machine IP: 10.0.0.254 (name radio) | ||
| 6 | Internet Hub IP: 10.0.0.253 (name modem) | ||
| 7 | Server IP: 10.0.0.1 | ||
| 8 | |||
| 9 | So I can change which internet connection I use simply by changing the default route on the server. | ||
| 10 | What I do, is set up my default gateway as being radio, with a metric of 0 | ||
| 11 | route add default gw radio metric 0 | ||
| 12 | I then add the backup route in as well, with a higher metric | ||
| 13 | route add default gw modem metric 10 | ||
| 14 | |||
| 15 | The higher metric means it is used less preferentially. As I don't have load balancing enabled, it doesn't get used at all. | ||
| 16 | |||
| 17 | So, If I discover that the radio connection has gone down, I can manually failover by issuing | ||
| 18 | route del default gw | ||
| 19 | When it comes back up, I can reinstate it as the default link by doing | ||
| 20 | route add default gw radio metric 0 | ||
| 21 | |||
| 22 | |||
| 23 | The simplest way of knowing if the link has gone down is to ping a remote host. However, I need to make sure I am pinging said host via the radio link, or else when I drop the default route, the ping will succeed via the backup route. | ||
| 24 | So I add a route to the next-hop after the radio link. | ||
| 25 | route add ip.of.next.hop gw radio | ||
| 26 | I can then ping ip.of.next.hop - if it succeeds, the radio link is up. If it fails, the radio link is down. | ||
| 27 | |||
| 28 | |||
| 29 | I can automate this a number of ways. One way I finalized on was a cron script that is run every minute | ||
| 30 | |||
| 31 | :/usr/local/sbin# cat check-internet.pl | ||
| 32 | |||
| 33 | #!/usr/bin/perl | ||
| 34 | |||
| 35 | $primary="radio"; | ||
| 36 | $backup="modem"; | ||
| 37 | $target="next-hop"; | ||
| 38 | $emailaddr="daniel"; | ||
| 39 | |||
| 40 | $state_file="/var/state/route-checks"; | ||
| 41 | $tolerance="3"; | ||
| 42 | |||
| 43 | $retval=system("ping $target -i 1 -c 5 > /dev/null"); | ||
| 44 | $retval = $retval / 256; | ||
| 45 | $cur_state=`cat $state_file`; | ||
| 46 | |||
| 47 | if ($retval > 0) { | ||
| 48 | $next_state = $cur_state + 1; | ||
| 49 | |||
| 50 | if ($cur_state < $tolerance) { | ||
| 51 | system("echo $next_state > $state_file"); | ||
| 52 | } elsif ($cur_state == $tolerance) { | ||
| 53 | system("echo $next_state > $state_file"); | ||
| 54 | system("route del default gw $primary"); | ||
| 55 | system("echo \"Droppinog default gw ($primary)\" | mail -s \"Route Change (Down)\" $emailaddr"); | ||
| 56 | } elsif ($cur_state > $tolerance ) { | ||
| 57 | #do nothing, already dropped the route! | ||
| 58 | } | ||
| 59 | } else { | ||
| 60 | $next_state = 0; | ||
| 61 | |||
| 62 | if ($cur_state == 0) { | ||
| 63 | #do nothing | ||
| 64 | } elsif ($cur_state <= $tolerance) { | ||
| 65 | # reset the counter | ||
| 66 | system("echo $next_state > $state_file"); | ||
| 67 | } else { | ||
| 68 | #reset counter and bring up default route; | ||
| 69 | system("echo $next_state > $state_file"); | ||
| 70 | system("route add default gw $primary"); | ||
| 71 | system("echo \"Restoring default gw ($primary)\" | mail -s \"Route Change (UP)\" $emailaddr"); | ||
| 72 | } | ||
| 73 | } | ||
| 74 | |||
| 75 | |||
| 76 | Ignoring the sloppy coding (the empty if or else {} pairs were left intentionally, as I was adding logging stuff to the script and wanting to do some other stuff with it. | ||
| 77 | |||
| 78 | What this does, is pings the nexthop 5 times. If ping returns 0, its fine. If it returns 256 or higher, there was 100% packet loss. or > 80%. Or something. I can't remember exactly - go read the ping source code. | ||
| 79 | Every time we get a 256 retval, we increment the counter stored in /var/state/route-checks. | ||
| 80 | When this counter gets to the threshold value, we increment the counter again, and drop the default route. | ||
| 81 | If we continue to get bad results, we ignore them | ||
| 82 | |||
| 83 | If we get a 0 retval, and the counter is less or equal to the tolerance level, we reset the counter 0. | ||
| 84 | If its above the tolerance level, it means we've dropped the default route, so we reset the counter, and bring the default route back up | ||
| 85 | |||
| 86 | |||
| 87 | Thats pretty simple, and it works. Although with a tolerance of 3, as shown, it takes 4 minutes or so to work out the link is down and to bring it back up. Cron can only schedule tasks every minute, so thats the smallest accuracy you'll get. | ||
| 88 | |||
| 89 | |||
| 90 | Another method is to use [Nagios]. I have a eventhandler that I wrote for nagios, but I never got it working nicely, hence writing my own above. I can provide it as a sample, however I think the issues were due to the link going to a hard critical state straight away, and not being in a warning state for any length of time. | ||
| 3 | CraigBox | 91 | |
| 92 | -- DanielLawson | ||
| 4 | EeroVolotinen | 93 | |
| 94 | |||
| 95 | - | ||
| 96 | |||
| 97 | Just little notice from Eero, I don't even know if this is correct, but: | ||
| 98 | |||
| 99 | http://www.pcquest.com/content/linux/103091901.asp says something about gc_timeout? | ||
| 100 | |||
| 101 | If this is correct I only need to set 2 gateway cards and enter something like that: | ||
| 102 | |||
| 103 | # route add default gw 192.168.1.2 dev eth0 metric 0 | ||
| 104 | # route add default gw 192.168.2.2 dev eth1 metric 10 | ||
| 105 | # echo 300 > /proc/sys/net/ipv4/route/gc_timeout or sysctl net.ipv4.route.gc_timeout = 300 | ||
| 106 | |||
| 107 | Usually best way is to put sysctl settings to /etc/sysctl.conf | ||
| 108 | |||
| 109 | and kernel automagically monitors lines and uses better? I haven't tested this not yet, but | ||
| 110 | I will soon. | ||
| 111 | |||
| 112 | |||
| 113 | |||
| 114 | |||
| 115 | -- Eero Volotinen (eero@jlug.fi) | ||
| 3 | CraigBox | 116 | |
| 117 | ----- | ||
| 118 | CategoryNetworking |
lib/blame.php:177: Warning: Invalid argument supplied for foreach()
lib/blame.php (In template 'html'):177: Warning: Invalid argument supplied for foreach()
lib/plugin/WlugLicense.php (In template 'html'):99: Warning: Invalid argument supplied for foreach()