Penguin
Note: You are viewing an old revision of this page. View the current version.

(This is some notes I'm jotting down while I'm working on this, I intend to come back and clean this up later)

ifup dummy0
tc filter add dev $DEV parent ffff: protocol ip <i>filter-rule</i> flowid 1:2 action mirred egress mirror dev dummy0
tcpdump -i dummy0
  • HTB: quantum of class XXXXYYYY is big. Consider r2q change. means class XXXX:YYYY has a massive quantum. quantum by default is the rate of the class, divided by "r2q". http://www.docum.org/docum.org/faq/cache/31.html
  • filter...protocol specifies which skb->protocol you're talking about, normally skb->protocol == ethertype. If you don't care you /must/ in some circumstances specify "protocol all". In some situations protocol ethertype works, in some situations it gives an invalid argument.
  • The default action of the sfq's internal classifier when a packet doesn't match, is to always put it in bucket 0. These users will get abysmal performance under any kind of load. The "flow" external classifier, when a packet doesn't match, is to drop it all together. These users get 100% packet loss, with or without load. At least the second one is an obvious problem during testing, the first one is often only discovered after users whine.
  • ifb (Intermediate Functional Block) is a replacement for IMQ. ifb is in the kernel. ref
  • We were getting errors having two rules at the same priority. I think having two u32 rules at the same priority tries to merge them into one hashtable, if this is not possible you get an "invalid argument". Consider using a unique priority/preference for one of the rules and see if that solves the issue.

To match PPPoE discovery ethertype:

$TC filter add dev $DEV \
        pref 10 parent $Q_ROOT: \
        protocol all \
        basic match "cmp(u16 at 12 layer 2 eq $ETH_P_PPPOED)" \
        flowid $Q_ROOT:$C_PPPoE
  • If you want things to be perfectly fair:
tc qdisc add dev $DEV \
        root handle 1: \
        sfq

tc filter add dev $DEV \
        pref 1 parent 1:1 handle 100 \
        protocol all \
        flow hash keys dst divisor 1024

This will be fair across all destination IP addresses. We have a set of patches to allow this across src/dst mac addresses.

  • The notation x:y, x is the qdisc number, y is the class number within the qdisc.
  • Some barely documented, but useful help commands:

    • sudo ./tc filter add dev eth1 basic match help
    • sudo ./tc filter add dev eth1 basic match 'cmp(help)'
    • sudo ./tc filter add dev eth1 basic match 'meta(help)'
    • sudo ./tc filter add dev eth1 basic match 'meta(list)'
    • sudo ./tc filter add dev eth1 basic match 'nbyte(help)'
    • sudo ./tc filter add dev eth1 basic match 'u32(help)'
    • sudo ./tc filter add dev eth1 u32 help
    • sudo ./tc filter add dev eth1 tcindex help
    • sudo ./tc filter add dev eth1 rsvp help
    • sudo ./tc filter add dev eth1 flow help
    • sudo ./tc filter add dev eth1 fw help
    • sudo ./tc filter add dev eth1 route help
    • sudo ./tc qdisc add dev eth1 atm help (might not be compiled in) [classful]
    • sudo ./tc qdisc add dev eth1 cbq help [classful]
    • sudo ./tc qdisc add dev eth1 dsmark help [classful]
    • sudo ./tc qdisc add dev eth1 bfifo help [classless]
    • sudo ./tc qdisc add dev eth1 pfifo help [classless]
    • pfifo_fast is a supported qdisc, but takes no options. [classless]
    • sudo ./tc qdisc add dev eth1 gred help [classless]
    • sudo ./tc qdisc add dev eth1 hfsc help [classful]
    • sudo ./tc qdisc add dev eth1 htb help (htb2 is an alias) [classful]
    • sudo ./tc qdisc add dev eth1 ingress help [classless]
    • sudo ./tc qdisc add dev eth1 netem help [classless]
    • sudo ./tc qdisc add dev eth1 prio help [classless]
    • sudo ./tc qdisc add dev eth1 red help [classless]
    • sudo ./tc qdisc add dev eth1 rr help [classless]
    • sudo ./tc qdisc add dev eth1 sfq help [classless]
    • sudo ./tc qdisc add dev eth1 tbf help [classless]
    • sudo ./tc action add action gact help
    • tc action ... ipt ... ?
    • sudo ./tc action add action mirred help
    • sudo ./tc action add action nat help
    • sudo ./tc action add action pedit help (pedit says it has an example, ... it doesn't. But the command does look particularly interesting)
    • sudo ./tc action add action police help

filters

basic

./tc filter add dev eth1 basic match help

basic is anything but. It allows complicated matches to be built up from boolean operations on various criteria (called extended matches, or "ematches"). The syntax for this is "criteria(arguments)". You can use brackets to force precedence, as well as "and","or" and "not" to combine criteria. Supported extended match modules are "cmp","meta","nbyte" and "u32". see tc filter add dev lo basic match cmp(help), tc filter add dev lo basic match meta(help), tc filter add dev lo basic match meta(list), tc filter add dev lo basic match nbyte(help) and tc filter add dev lo basic match u32(help) for suggested syntaxes.

the cmp extended basic match appears to be the recommended way to match on layer 2 fields (ref)

cmp ematch

This ematch module lets you match on various 8,16 or 32 bit quantities relative to layer 2, layer 3 or transport headers.

An example (that we didn't have time to get to work properly, but it shows a valid syntax), this should match IP packets inside PPPoE sourced from 192.0.2.0/24.:

$TC filter add dev $DEV                \
       parent 1: prio 10               \
       protocol all                    \
       basic match "cmp(u16 at 12 layer 2 eq 0x8864) and cmp(u32 at 34 layer 2 mask 0xFFFFFF00 eq 0xC0000200)" \
       flowid 1:10

meta ematch

This ematch module lets you match on various attributes of the system (such as load average), or metadata about the packet (such as the firewall mark). tc filter add dev lo basic match meta(list) lists all the possible attributes.

nbyte ematch

When you want to match on a string inside a packet, nbyte is the module for you.

u32 ematch

u32 is the same as the normal u32 match. Being an ematch it allows for lt,gt or eq matches as well as the usual matches. You can also use the "basic" system to allow for combining this with other ematches in one single rule.

u32

The u32 match appears to be the most frequently used match. It appears that having multiple u32 matches on the same "prio" will be attempted to be "stack" into a single hash table. Errors can occur if they can't be "stacked". Try giving them a unique prio. u32 always matches from the "network" (aka IP/IP6) header. To get at the link layer header, negative offsets can be used as a hack.

tcindex

This matches on the skb->tc_index. I don't know what this is used for?

rsvp

Match on RSVP flow labels. (ref)

flow

This is an extremely useful classifier that allows classifying packets into queues inside a SFQ.

Example:

$TC filter add dev $DEV                         \
        parent 1: prio 1                        \
        handle 2                                \
        protocol all                            \
        flow hash keys dst divisor 1024

This rule will change the SFQ classifier from the Internal one, to using one that only matches on destination address. This will fairly share bandwidth between destination IP's, instead of between 5 tuple flows.

One caveat discovered with the sfq classifier is that if a packet doesn't match, it will get dropped from the sfq, where as the default behavior of the SFQ's internal hashing algorithm is for packets it can't classify, to place them in bucket 0. While the external sfq classifier makes this obvious during testing (100% packet loss), the internal classifier will only show horrible performance when the sfq is under load (and there are many other buckets used).

The divisor is the divisor of a modulo operation. It must be equal or smaller than the hash size configured in the SFQ that this is classifying for. The SFQ size is defined at compile time, by default to be 1,024 elements in size. So set the divisor to 1024.

the flow keys can be src (source ip), dst (destination ip), proto (ip protocol), proto-src (transport protocol source address), proto-dst (transport protocol destination address), iif (input interface), priority (?), mark (firewall mark), nfct (netfilter conntrack?), nfct-src (original netfilter source), nfct-dst (original netfilter destination), nfct-proto-src (original netfilter conntrack transport protocol src), nfct-proto-dst (and so on), rt-classid (?), sk-uid (uid from the skbuff), sk-gid (gid from the skbuff), vlan-tag. At WAND we have extended this to include mac-src, mac-dst, mac-proto.

This also supports or/and/xor/rshift/append <i>NUM<i>. I don't know why this is here, possibly to allow you to attach multiple classifiers to the same sfq, and then limit them to different parts of the hash table?

fw

This match module matches only on the fwmark. It uses the "handle" to select which firewall mark to match. Internally this uses a hash table, so having multiple fwmarks at the same prio appear to able to "stack".

Example:

$TC filter add dev $DEV                 \
        parent 1: prio 2                \
        protocol ip                     \
        handle $FWMARK                  \
        fw                              \
        flowid 1:10

route

This match module allows matching on "realms". realms are tags that can be applied to routes. Supports matching from realm, fromif tag, to realm. I've not experimented with this match, but several other people have. This seems to be the easiest way to match routes from quagga. (eg national vs international)