Rev | Author | # | Line |
---|---|---|---|
1 | DanielLawson | 1 | I'll describe a simple way of setting up a redundant internet connection. |
2 | |||
2 | CraigBox | 3 | My setup involves my main server being the 'router' for the network. I have another machine, elsewhere on the network that has a radio card in it. All traffic is directed to the main server, which then routes it on the the radio machine, which does the [NAT]. I also have another connection - a so-called "Internet Hub" which takes ethernet in and has a serial line out, connected to a modem. This does dial on demand and NAT. |
1 | DanielLawson | 4 | |
5 | Radio machine IP: 10.0.0.254 (name radio) | ||
6 | Internet Hub IP: 10.0.0.253 (name modem) | ||
7 | Server IP: 10.0.0.1 | ||
8 | |||
9 | So I can change which internet connection I use simply by changing the default route on the server. | ||
10 | What I do, is set up my default gateway as being radio, with a metric of 0 | ||
11 | route add default gw radio metric 0 | ||
12 | I then add the backup route in as well, with a higher metric | ||
2 | CraigBox | 13 | route add default gw modem metric 10 |
1 | DanielLawson | 14 | |
15 | The higher metric means it is used less preferentially. As I don't have load balancing enabled, it doesn't get used at all. | ||
16 | |||
17 | So, If I discover that the radio connection has gone down, I can manually failover by issuing | ||
18 | route del default gw | ||
19 | When it comes back up, I can reinstate it as the default link by doing | ||
20 | route add default gw radio metric 0 | ||
21 | |||
22 | |||
23 | The simplest way of knowing if the link has gone down is to ping a remote host. However, I need to make sure I am pinging said host via the radio link, or else when I drop the default route, the ping will succeed via the backup route. | ||
24 | So I add a route to the next-hop after the radio link. | ||
25 | route add ip.of.next.hop gw radio | ||
26 | I can then ping ip.of.next.hop - if it succeeds, the radio link is up. If it fails, the radio link is down. | ||
27 | |||
28 | |||
29 | I can automate this a number of ways. One way I finalized on was a cron script that is run every minute | ||
30 | |||
2 | CraigBox | 31 | :/usr/local/sbin# cat check-internet.pl |
1 | DanielLawson | 32 | |
33 | #!/usr/bin/perl | ||
34 | |||
35 | $primary="radio"; | ||
36 | $backup="modem"; | ||
37 | $target="next-hop"; | ||
38 | $emailaddr="daniel"; | ||
39 | |||
40 | $state_file="/var/state/route-checks"; | ||
41 | $tolerance="3"; | ||
42 | |||
43 | $retval=system("ping $target -i 1 -c 5 > /dev/null"); | ||
44 | $retval = $retval / 256; | ||
45 | $cur_state=`cat $state_file`; | ||
46 | |||
47 | if ($retval > 0) { | ||
48 | $next_state = $cur_state + 1; | ||
49 | |||
50 | if ($cur_state < $tolerance) { | ||
51 | system("echo $next_state > $state_file"); | ||
52 | } elsif ($cur_state == $tolerance) { | ||
53 | system("echo $next_state > $state_file"); | ||
54 | system("route del default gw $primary"); | ||
55 | system("echo \"Droppinog default gw ($primary)\" | mail -s \"Route Change (Down)\" $emailaddr"); | ||
56 | } elsif ($cur_state > $tolerance ) { | ||
57 | #do nothing, already dropped the route! | ||
58 | } | ||
59 | } else { | ||
60 | $next_state = 0; | ||
61 | |||
62 | if ($cur_state == 0) { | ||
63 | #do nothing | ||
64 | } elsif ($cur_state <= $tolerance) { | ||
65 | # reset the counter | ||
66 | system("echo $next_state > $state_file"); | ||
67 | } else { | ||
68 | #reset counter and bring up default route; | ||
69 | system("echo $next_state > $state_file"); | ||
70 | system("route add default gw $primary"); | ||
71 | system("echo \"Restoring default gw ($primary)\" | mail -s \"Route Change (UP)\" $emailaddr"); | ||
72 | } | ||
73 | } | ||
74 | |||
75 | |||
76 | Ignoring the sloppy coding (the empty if or else {} pairs were left intentionally, as I was adding logging stuff to the script and wanting to do some other stuff with it. | ||
77 | |||
78 | What this does, is pings the nexthop 5 times. If ping returns 0, its fine. If it returns 256 or higher, there was 100% packet loss. or > 80%. Or something. I can't remember exactly - go read the ping source code. | ||
79 | Every time we get a 256 retval, we increment the counter stored in /var/state/route-checks. | ||
80 | When this counter gets to the threshold value, we increment the counter again, and drop the default route. | ||
81 | If we continue to get bad results, we ignore them | ||
82 | |||
83 | If we get a 0 retval, and the counter is less or equal to the tolerance level, we reset the counter 0. | ||
84 | If its above the tolerance level, it means we've dropped the default route, so we reset the counter, and bring the default route back up | ||
85 | |||
86 | |||
2 | CraigBox | 87 | Thats pretty simple, and it works. Although with a tolerance of 3, as shown, it takes 4 minutes or so to work out the link is down and to bring it back up. Cron can only schedule tasks every minute, so thats the smallest accuracy you'll get. |
1 | DanielLawson | 88 | |
89 | |||
2 | CraigBox | 90 | Another method is to use [Nagios]. I have a eventhandler that I wrote for nagios, but I never got it working nicely, hence writing my own above. I can provide it as a sample, however I think the issues were due to the link going to a hard critical state straight away, and not being in a warning state for any length of time. |
3 | CraigBox | 91 | |
92 | -- DanielLawson | ||
4 | EeroVolotinen | 93 | |
94 | |||
95 | - | ||
96 | |||
97 | Just little notice from Eero, I don't even know if this is correct, but: | ||
98 | |||
99 | http://www.pcquest.com/content/linux/103091901.asp says something about gc_timeout? | ||
100 | |||
101 | If this is correct I only need to set 2 gateway cards and enter something like that: | ||
102 | |||
103 | # route add default gw 192.168.1.2 dev eth0 metric 0 | ||
104 | # route add default gw 192.168.2.2 dev eth1 metric 10 | ||
6 | EeroVolotinen | 105 | # echo 300 > /proc/sys/net/ipv4/route/gc_timeout or sysctl net.ipv4.route.gc_timeout = 300 |
5 | EeroVolotinen | 106 | |
107 | Usually best way is to put sysctl settings to /etc/sysctl.conf | ||
4 | EeroVolotinen | 108 | |
109 | and kernel automagically monitors lines and uses better? I haven't tested this not yet, but | ||
110 | I will soon. | ||
111 | |||
112 | |||
113 | |||
114 | |||
115 | -- Eero Volotinen (eero@jlug.fi) | ||
3 | CraigBox | 116 | |
117 | ----- | ||
118 | CategoryNetworking |
lib/blame.php:177: Warning: Invalid argument supplied for foreach()
lib/blame.php (In template 'html'):177: Warning: Invalid argument supplied for foreach()
lib/plugin/WlugLicense.php (In template 'html'):99: Warning: Invalid argument supplied for foreach()