Penguin
Blame: TrafficControl
EditPageHistoryDiffInfoLikePages
Annotated edit history of TrafficControl version 13, including all changes. View license author blame.
Rev Author # Line
7 PerryLorier 1 (This is some notes I'm jotting down while I'm working on this, I intend to come back and clean this up later)
2
3 * u32 is dumb: http://lists.openwall.net/netdev/2007/08/15/65
4 * noone knows what "tc action" does.
8 PerryLorier 5 * basic match meta filter module looks awesome http://lwn.net/Articles/119536/
6 * maybe use basic match cmp() to match on ethertype, mac addresses and other layer 2 miscellanea.
7 * for debugging filters you can use actions (untested). eg:
8 <verbatim>
9 ifup dummy0
10 tc filter add dev $DEV parent ffff: protocol ip <i>filter-rule</i> flowid 1:2 action mirred egress mirror dev dummy0
11 tcpdump -i dummy0
12 </verbatim>
7 PerryLorier 13 * <tt>HTB: quantum of class XXXXYYYY is big. Consider r2q change.</tt> means class XXXX:YYYY has a massive quantum. quantum by default is the rate of the class, divided by "r2q". http://www.docum.org/docum.org/faq/cache/31.html
14 * filter...protocol specifies which skb->protocol you're talking about, normally skb->protocol == ethertype. If you don't care you /must/ in some circumstances specify "protocol all". In some situations protocol <i>ethertype</i> works, in some situations it gives an invalid argument.
15 * The default action of the sfq's internal classifier when a packet doesn't match, is to always put it in bucket 0. These users will get abysmal performance under any kind of load. The "flow" external classifier, when a packet doesn't match, is to drop it all together. These users get 100% packet loss, with or without load. At least the second one is an obvious problem during testing, the first one is often only discovered after users whine.
16 * <tt>ifb</tt> (Intermediate Functional Block) is a replacement for IMQ. ifb is in the kernel. [ref|http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=drivers/net/ifb.c;hb=HEAD]
9 PerryLorier 17 * We were getting errors having two rules at the same priority. I think having two u32 rules at the same priority tries to merge them into one hashtable, if this is not possible you get an "invalid argument". Consider using a unique priority/preference for one of the rules and see if that solves the issue.
7 PerryLorier 18
19 To match PPPoE discovery ethertype:
20 <verbatim>
21 $TC filter add dev $DEV \
22 pref 10 parent $Q_ROOT: \
23 protocol all \
24 basic match "cmp(u16 at 12 layer 2 eq $ETH_P_PPPOED)" \
25 flowid $Q_ROOT:$C_PPPoE
26 </verbatim>
27
28 * If you want things to be perfectly fair:
29 <verbatim>
30 tc qdisc add dev $DEV \
31 root handle 1: \
32 sfq
33
34 tc filter add dev $DEV \
35 pref 1 parent 1:1 handle 100 \
36 protocol all \
37 flow hash keys dst divisor 1024
38 </verbatim>
39 This will be fair across all destination IP addresses. We have a set of patches to allow this across src/dst mac addresses.
40
41 * The notation x:y, x is the qdisc number, y is the class number within the qdisc.
42
43 * Some barely documented, but useful help commands:
44 ** <tt>sudo ./tc filter add dev eth1 basic match help</tt>
45 ** <tt>sudo ./tc filter add dev eth1 basic match 'cmp(help)'</tt>
46 ** <tt>sudo ./tc filter add dev eth1 basic match 'meta(help)'</tt>
47 ** <tt>sudo ./tc filter add dev eth1 basic match 'meta(list)'</tt>
48 ** <tt>sudo ./tc filter add dev eth1 basic match 'nbyte(help)'</tt>
49 ** <tt>sudo ./tc filter add dev eth1 basic match 'u32(help)'</tt>
50 ** <tt>sudo ./tc filter add dev eth1 u32 help</tt>
51 ** <tt>sudo ./tc filter add dev eth1 tcindex help</tt>
52 ** <tt>sudo ./tc filter add dev eth1 rsvp help</tt>
53 ** <tt>sudo ./tc filter add dev eth1 flow help</tt>
54 ** <tt>sudo ./tc filter add dev eth1 fw help</tt>
55 ** <tt>sudo ./tc filter add dev eth1 route help</tt>
56 ** <tt>sudo ./tc qdisc add dev eth1 atm help</tt> (might not be compiled in) ~[classful]
57 ** <tt>sudo ./tc qdisc add dev eth1 cbq help</tt> ~[classful]
58 ** <tt>sudo ./tc qdisc add dev eth1 dsmark help</tt> ~[classful]
59 ** <tt>sudo ./tc qdisc add dev eth1 bfifo help</tt> ~[classless]
60 ** <tt>sudo ./tc qdisc add dev eth1 pfifo help</tt> ~[classless]
61 ** pfifo_fast is a supported qdisc, but takes no options. ~[classless]
62 ** <tt>sudo ./tc qdisc add dev eth1 gred help</tt> ~[classless]
63 ** <tt>sudo ./tc qdisc add dev eth1 hfsc help</tt> ~[classful]
64 ** <tt>sudo ./tc qdisc add dev eth1 htb help</tt> (htb2 is an alias) ~[classful]
65 ** <tt>sudo ./tc qdisc add dev eth1 ingress help</tt> ~[classless]
66 ** <tt>sudo ./tc qdisc add dev eth1 netem help</tt> ~[classless]
67 ** <tt>sudo ./tc qdisc add dev eth1 prio help</tt> ~[classless]
68 ** <tt>sudo ./tc qdisc add dev eth1 red help</tt> ~[classless]
69 ** <tt>sudo ./tc qdisc add dev eth1 rr help</tt> ~[classless]
70 ** <tt>sudo ./tc qdisc add dev eth1 sfq help</tt> ~[classless]
71 ** <tt>sudo ./tc qdisc add dev eth1 tbf help</tt> ~[classless]
72 ** <tt>sudo ./tc action add action gact help</tt>
73 ** tc action ... ipt ... ?
74 ** <tt>sudo ./tc action add action mirred help</tt>
75 ** <tt>sudo ./tc action add action nat help</tt>
76 ** <tt>sudo ./tc action add action pedit help</tt> (pedit says it has an example, ... it doesn't. But the command does look particularly interesting)
77 ** <tt>sudo ./tc action add action police help</tt>
10 PerryLorier 78
79 !!!filters
80 !!basic
81 <tt>./tc filter add dev eth1 basic match help</tt>
82
83 basic is anything but. It allows complicated matches to be built up from boolean operations on various criteria (called extended matches, or "ematches"). The syntax for this is "criteria(arguments)". You can use brackets to force precedence, as well as "and","or" and "not" to combine criteria. Supported extended match modules are "cmp","meta","nbyte" and "u32". see <tt>tc filter add dev lo basic match cmp(help)</tt>, <tt>tc filter add dev lo basic match meta(help)</tt>, <tt>tc filter add dev lo basic match meta(list)</tt>, <tt>tc filter add dev lo basic match nbyte(help)</tt> and <tt>tc filter add dev lo basic match u32(help)</tt> for suggested syntaxes.
84
85 the cmp extended basic match appears to be the recommended way to match on layer 2 fields ([ref|http://lists.openwall.net/netdev/2007/08/15/63])
86
87 !cmp ematch
88 This ematch module lets you match on various 8,16 or 32 bit quantities relative to layer 2, layer 3 or transport headers.
12 PerryLorier 89
90 An example (that we didn't have time to get to work properly, but it shows a valid syntax), this should match IP packets inside PPPoE sourced from 192.0.2.0/24.:
91 <verbatim>
92 $TC filter add dev $DEV \
93 parent 1: prio 10 \
94 protocol all \
95 basic match "cmp(u16 at 12 layer 2 eq 0x8864) and cmp(u32 at 34 layer 2 mask 0xFFFFFF00 eq 0xC0000200)" \
96 flowid 1:10
97 </verbatim>
98
10 PerryLorier 99 !meta ematch
100 This ematch module lets you match on various attributes of the system (such as load average), or metadata about the packet (such as the firewall mark).
101 <tt>tc filter add dev lo basic match meta(list)</tt> lists all the possible attributes.
102 !nbyte ematch
103 When you want to match on a string inside a packet, nbyte is the module for you.
104 !u32 ematch
105 u32 is the same as the normal u32 match. Being an ematch it allows for lt,gt or eq matches as well as the usual matches. You can also use the "basic" system to allow for combining this with other ematches in one single rule.
106
107 !!u32
108 The u32 match appears to be the most frequently used match. It appears that having multiple u32 matches on the same "prio" will be attempted to be "stack" into a single hash table. Errors can occur if they can't be "stacked". Try giving them a unique prio. u32 always matches from the "network" (aka IP/IP6) header. To get at the link layer header, negative offsets can be used as a hack.
109
110 !!tcindex
13 PerryLorier 111 This matches on the skb->tc_index. I don't know what this is used for? ([ref|http://www.opalsoft.net/qos/DS-210.htm])
10 PerryLorier 112
113 !!rsvp
114 Match on RSVP flow labels. ([ref|http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/sched/cls_rsvp.h;hb=HEAD])
115
116 !!flow
117 This is an extremely useful classifier that allows classifying packets into queues inside a <tt>SFQ</tt>.
118
119 Example:
120 <verbatim>
121 $TC filter add dev $DEV \
122 parent 1: prio 1 \
123 handle 2 \
124 protocol all \
125 flow hash keys dst divisor 1024
126 </verbatim>
127 This rule will change the SFQ classifier from the Internal one, to using one that only matches on destination address. This will fairly share bandwidth between destination IP's, instead of between 5 tuple flows.
128
129 One caveat discovered with the sfq classifier is that if a packet doesn't match, it will get dropped from the sfq, where as the default behavior of the SFQ's internal hashing algorithm is for packets it can't classify, to place them in bucket 0. While the external sfq classifier makes this obvious during testing (100% packet loss), the internal classifier will only show horrible performance when the sfq is under load (and there are many other buckets used).
130
11 PerryLorier 131 The divisor is the divisor of a modulo operation. It must be equal or smaller than the hash size configured in the SFQ that this is classifying for. The SFQ size is defined at compile time, by default to be 1,024 elements in size. So set the divisor to 1024.
10 PerryLorier 132
133 the flow keys can be src (source ip), dst (destination ip), proto (ip protocol), proto-src (transport protocol source address), proto-dst (transport protocol destination address), iif (input interface), priority (?), mark (firewall mark), nfct (netfilter conntrack?), nfct-src (original netfilter source), nfct-dst (original netfilter destination), nfct-proto-src (original netfilter conntrack transport protocol src), nfct-proto-dst (and so on), rt-classid (?), sk-uid (uid from the skbuff), sk-gid (gid from the skbuff), vlan-tag. At [WAND] we have extended this to include mac-src, mac-dst, mac-proto.
134
11 PerryLorier 135 This also supports <tt>or</tt>/<tt>and</tt>/<tt>xor</tt>/<tt>rshift</tt>/<tt>append</tt> <i>NUM<i>. I don't know why this is here, possibly to allow you to attach multiple classifiers to the same sfq, and then limit them to different parts of the hash table?
10 PerryLorier 136
137 !!fw
138 This match module matches only on the fwmark. It uses the "handle" to select which firewall mark to match. Internally this uses a hash table, so having multiple fwmarks at the same prio appear to able to "stack".
139
140 Example:
141 <verbatim>
142 $TC filter add dev $DEV \
143 parent 1: prio 2 \
144 protocol ip \
145 handle $FWMARK \
146 fw \
147 flowid 1:10
148 </verbatim>
149
150 !!route
11 PerryLorier 151 This match module allows matching on "realms". realms are tags that can be applied to routes. Supports matching <tt>from <i>realm</i></tt>, <tt>fromif <i>tag</i></tt>, <tt>to <i>realm</i></tt>. I've not experimented with this match, but [several other people have|http://www.meta.net.nz/~daniel/blog/2008/09/24/qos-and-ip-accounting-with-bgp-under-linux/]. This seems to be the easiest way to match routes from quagga. (eg national vs international)