Penguin

Before you start, if you want an authoritive guide to the following material, read The Linux Advanced Routing and Traffic Control HOWTO instead.

This is a brief but concerted effort to minimise the effect of TCP traffic on interactive applications, using QoS under Linux.

The major motivating factor behind this work was the effect the new queuing imposed before the bandwidth throttling at the RAN, on Telecom's network in early 2004.

I live in a flat where we pay 3rds of the cost of the internet bill, so between three people, they each should have an even crack at the bandwidth.

I devised two queuing structures that should give a decent balance between users, one for egress on the internal interface and one for egress on the external interface.

Egress on internal interface (eth0).

                 Root Class
                  Qdisc 1: PRIO
                 ----------------------------------------------
                 | Class 1:1: PRIO Queue priority 1           |
                 |   Qdisc 10: CBQ 128kbit                    |
                 | ------------------------------------------ |
                 | | Class 10:1(CBQ Class, 128kbit)         | |
                 | |  Qdisc 101: SFQ                        | |
                 | | -------------------------------------- | |
                 | | |  High Priority Per-User Queue      | | |
                 | | |  For interactive applications      | | |
                 | | |  and ICMP etc.                     | | |
                 | | -------------------------------------- | |
                 | | Class 10:2(CBQ Class, 128kbit)         | |
                 | |  Qdisc 102: SFQ                        | |
                 | | -------------------------------------- | |
                 | | |  High Priority Per-User Queue      | | |
                 | | |  For interactive applications      | | |
                 | | |  and ICMP etc.                     | | |
                 | | -------------------------------------- | |
                 | | Class 10:x(CBQ Class, 128kbit)         | |
                 | |  Qdisc 10x: SFQ                        | |
                 | | -------------------------------------- | |
                 | | |  High Priority Per-User Queue      | | |
                 | | |  For interactive applications      | | |
                 | | |  and ICMP etc.                     | | |
                 | | -------------------------------------- | |
                 | | Class 10:5(CBQ Class, 120kbit)         | |
                 | |  Qdisc 105: TBQ (Rate 96kbit)          | |
                 | | -------------------------------------- | |
                 | | |  Low Priority Per-User Queue       | | |
    Routing ->   | | |  For TCP and other non-interactive | | |     -> Network card
                 | | |  applications.                     | | |
                 | | -------------------------------------- | |
                 | | Class 10:6(CBQ Class, 120kbit)         | |
                 | |  Qdisc 106: TBQ (Rate 96kbit)          | |
                 | | -------------------------------------- | |
                 | | |  Low Priority Per-User Queue       | | |
                 | | |  For TCP and other non-interactive | | |
                 | | |  applications.                     | | |
                 | | -------------------------------------- | |
                 | | Class 10:x(CBQ Class, 120kbit)         | |
                 | |  Qdisc 10x: TBQ (Rate 96kbit)          | |
                 | | -------------------------------------- | |
                 | | |  Low Priority Per-User Queue       | | |
                 | | |  For TCP and other non-interactive | | |
                 | | |  applications.                     | | |
                 | | -------------------------------------- | |
                 | ------------------------------------------ |
                 | Class 1:2 (PRIO band priority 2)           |
                 |  Qdisc 20: SFQ                             |
                 | ------------------------------------------ |
                 | |         Bypass Queue for internal      | |
                 | |               Traffic                  | |
                 | ------------------------------------------ |
                 |                                            |
                 ----------------------------------------------


Egress on external interface (ppp0).

                   Root Class
                     Qdisc 10: CBQ 128kbit
                   ------------------------------------------
                   | Class 10:1(CBQ Class, 128kbit)         |
                   |  Qdisc 101: SFQ                        |
                   | -------------------------------------- |
                   | |  High Priority Per-User Queue      | |
                   | |  For interactive applications      | |
                   | |  and ICMP etc.                     | |
                   | -------------------------------------- |
                   | Class 10:2(CBQ Class, 128kbit)         |
                   |  Qdisc 102: SFQ                        |
                   | -------------------------------------- |
                   | |  High Priority Per-User Queue      | |
                   | |  For interactive applications      | |
                   | |  and ICMP etc.                     | |
                   | -------------------------------------- |
                   | Class 10:x(CBQ Class, 128kbit)         |
                   |  Qdisc 10x: SFQ                        |
                   | -------------------------------------- |
                   | |  High Priority Per-User Queue      | |
                   | |  For interactive applications      | |
                   | |  and ICMP etc.                     | |
                   | -------------------------------------- |
                   | Class 10:5(CBQ Class, 120kbit)         |
                   |       Qdisc 105: TBQ (Rate 96kbit)     |
                   | -------------------------------------- |
                   | |  Low Priority Per-User Queue       | |
    Routing ->     | |  For TCP and other non-interactive | |      -> ADSL modem
                   | |  applications.                     | |
                   | -------------------------------------- |
                   | Class 10:6(CBQ Class, 120kbit)         |
                   |       Qdisc 106: TBQ (Rate 96kbit)     |
                   | -------------------------------------- |
                   | |  Low Priority Per-User Queue       | |
                   | |  For TCP and other non-interactive | |
                   | |  applications.                     | |
                   | -------------------------------------- |
                   | Class 10:x(CBQ Class, 120kbit)         |
                   |       Qdisc 10x: TBQ (Rate 96kbit)     |
                   | -------------------------------------- |
                   | |  Low Priority Per-User Queue       | |
                   | |  For TCP and other non-interactive | |
                   | |  applications.                     | |
                   | -------------------------------------- |
                   ------------------------------------------

Linux QoS is based around a heirachical structure of queueing disciples (qdiscs), classes and filters.

There are two major types of qdiscs, those that are classful, or contain classes, and those that are classless, or leaf qdiscs.

Examples of Classless qdiscs

  • TBF - Token Bucket Filtering
  • SFQ - Stochiastic fairness queueing.

Examples of Classful qdiscs

  • CBQ - Class Based queueing
  • HTB - Heirachial Token Bucket Queueing
  • PRIO - Priority Queuing

Of these classful qdiscs, some such as the PRIO class automagically create classes under the qdisc. Others, such as CBQ require you to explicitly create the classes inside the qdisc if you wish to apply a qdisc to that class.

You can only apply a qdisc to a class, which is why classless qdiscs are therefore leaf nodes of the 'QoS tree' so to speak.

With this in mind, let us first create the QoS rules on the ppp0 interface.

First, lets delete everything and start on a clean slate.

 root@sleepy:# tc qdisc del dev ppp0 root
 root@sleepy:# tc qdisc show dev ppp0

Ok, now we need to add the qdisc to the device root class, we're using CBQ

 root@sleepy:# tc qdisc add dev ppp0 root handle 10: cbq bandwidth 6Mbit rate 127kbit avpkt 1000
 root@sleepy:# tc qdisc show dev ppp0
 qdisc cbq 10: rate 127Kbit (bounded,isolated) prio no-transmit
 root@sleepy:# tc class show dev ppp0
 class cbq 10: root rate 127Kbit (bounded,isolated) prio no-transmit

Notice how this has created a root class, we could refer to it as 10:0 if we desired. Lets create the first of our High priority queues.

 root@sleepy:# tc class add dev ppp0 parent 10: classid 10:1 cbq bandwidth 6Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20   avpkt 200 bounded
 root@sleepy:# tc class show dev ppp0
 class cbq 10: root rate 127Kbit (bounded,isolated) prio no-transmit
 class cbq 10:1 parent 10: rate 127Kbit (bounded) prio no-transmit

And apply the SFQ to traffic in that queue!

 root@sleepy:# tc qdisc show dev ppp0
 qdisc sfq 101: quantum 1500b
 qdisc cbq 10: rate 127Kbit (bounded,isolated) prio no-transmit

This is what the QoS looks like now.

                   Root Class
                     Qdisc 10: CBQ 127kbit
                   ------------------------------------------
                   | Class 10:1(CBQ Class, 127kbit)         |
                   |  Qdisc 101: SFQ                        |
                   | -------------------------------------- |
                   | |  High Priority Per-User Queue      | |
                   | |  For interactive applications      | |
                   | |  and ICMP etc.                     | |
                   | -------------------------------------- |
                   ------------------------------------------

Lets have a look at whats happening with traffic so far:

 root@sleepy:# tc -d -s qdisc show dev ppp0
 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024
  Sent 0 bytes 0 pkts (dropped 0, overlimits 0)

 qdisc cbq 10: rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b
 level 1 ewma 5 avpkt 1000b maxidle 1921us
  Sent 7672003 bytes 14328 pkts (dropped 70, overlimits 0)
   borrowed 0 overactions 0 avgidle 50393 undertime 0

 root@sleepy:# tc -d -s class show dev ppp0
 class cbq 10: root rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b
 level 1 ewma 5 avpkt 1000b maxidle 1921us
  Sent 7713158 bytes 14548 pkts (dropped 70, overlimits 0)
   borrowed 0 overactions 0 avgidle 48819 undertime 0
 class cbq 10:1 parent 10: leaf 101: rate 128Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b
 level 0 ewma 5 avpkt 200b maxidle 10812us
  Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
   borrowed 0 overactions 0 avgidle 283475 undertime 0

Well as we can see, things appear to be working, at least to a certain point.

All traffic is being queued by the root class, and not being passed to the SFQ qdisc inside class 10:1.

Now we need to use the third major building block of Linux QoS, the filter.

We need to apply a filter, to the root qdisc 10: to move certian traffic to the 10:1 class.

Linux iptables comes with a feature called fwmark, which allows packets to be tagged as they traverse firewall rules. This can be useful for tagging packets for QoS based on PROTO/IP/PORT information. The iptables MARK target, can be applied to the PREROUTING or POSTROUTING chains on any interface. Because however, I use IP MASQUERADING on my ppp0 interface, the internal IP of the source host sending data to the internet via ADSL has been lost in the ppp0 POSTROUTING chain.

Therefore, we will use the PREROUTING chain on the eth0 interface to mark the packets.

 root@sleepy:# iptables -t mangle -I PREROUTING -j MARK --set-mark 100 -s 10.1.13.3/32  -i eth0
 root@sleepy:# iptables  -L -n -t mangle  -x -v
 Chain PREROUTING (policy ACCEPT 367 packets, 125801 bytes)
     pkts      bytes target     prot opt in     out     source               destination
       32     2000 MARK       all  --  eth0   *       10.1.13.3            0.0.0.0/0           MARK set 0x64

We are now marking packets with 100, or hex 0x64.

All that remains to get packets matching the SFQ, is to add the filter with tc:

And the results...

Before:

 root@sleepy:# tc -s -d qdisc show dev ppp0
 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024
  Sent 0 bytes 0 pkts (dropped 0, overlimits 0)

 qdisc cbq 10: rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b
 level 1 ewma 5 avpkt 1000b maxidle 1921us
  Sent 22225127 bytes 39734 pkts (dropped 103, overlimits 0)
   borrowed 0 overactions 0 avgidle 50393 undertime 0

 root@sleepy:# tc -s -d class show dev ppp0
 class cbq 10: root rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b
 level 1 ewma 5 avpkt 1000b maxidle 1921us
  Sent 22206812 bytes 39626 pkts (dropped 103, overlimits 0)
   borrowed 0 overactions 0 avgidle 50393 undertime 0
 class cbq 10:1 parent 10: leaf 101: rate 128Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b
 level 0 ewma 5 avpkt 200b maxidle 10812us
  Sent 0 bytes 0 pkts (dropped 0, overlimits 0)
   borrowed 0 overactions 0 avgidle 283475 undertime 0

Adding Filter:

 root@sleepy:# tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 100 fw flowid 10:1
 root@sleepy:# tc filter show dev ppp0
 filter parent 10: protocol ip pref 1 fw
 filter parent 10: protocol ip pref 1 fw handle 0x64 classid 10:1

After:

 root@sleepy:# tc -s -d qdisc show dev ppp0
 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024
  Sent 548 bytes 9 pkts (dropped 0, overlimits 0)

 qdisc cbq 10: rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b
 level 1 ewma 5 avpkt 1000b maxidle 1921us
  Sent 22437536 bytes 40128 pkts (dropped 103, overlimits 0)
   borrowed 0 overactions 0 avgidle 50393 undertime 0

 root@sleepy:# tc -s -d class show dev ppp0
 class cbq 10: root rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b
 level 1 ewma 5 avpkt 1000b maxidle 1921us
  Sent 22308840 bytes 39979 pkts (dropped 103, overlimits 0)
   borrowed 0 overactions 0 avgidle 50393 undertime 0
 class cbq 10:1 parent 10: leaf 101: rate 128Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b
 level 0 ewma 5 avpkt 200b maxidle 10812us
  Sent 244 bytes 4 pkts (dropped 0, overlimits 0)
   borrowed 0 overactions 0 avgidle 283475 undertime 0

Alright, now we're getting some traffic in the SFQ now!

Adding the rest of the queues is relatively straight forward, providing you use a sane numbering scheme for your qdisc/class handles.

Here is the script to setup the QoS structure for ADSL as above, we're adding 5 user queues, for expansions sake, if they're not used, they don't affect anyone else.

 # Clear the slate
 tc qdisc del dev ppp0 root

 # Add a classful qdisc (cbq) to the root class

 tc qdisc add dev ppp0 root handle 10: cbq bandwidth 6Mbit rate 127kbit avpkt 1000

 # High priority Queues
 # Add the 5 High Priority classes under the CBQ Qdisc
 tc class add dev ppp0 parent 10: classid 10:1 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded
 tc class add dev ppp0 parent 10: classid 10:2 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded
 tc class add dev ppp0 parent 10: classid 10:3 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded
 tc class add dev ppp0 parent 10: classid 10:4 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded
 tc class add dev ppp0 parent 10: classid 10:5 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded

 # Add leaf qdiscs to these queues
 tc qdisc add dev ppp0 parent 10:1 handle 101: sfq
 tc qdisc add dev ppp0 parent 10:2 handle 102: sfq
 tc qdisc add dev ppp0 parent 10:3 handle 103: sfq
 tc qdisc add dev ppp0 parent 10:4 handle 104: sfq
 tc qdisc add dev ppp0 parent 10:5 handle 105: sfq

 # Add filters to match FWMARK packets and sort into these queues
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 101 fw flowid 10:1
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 102 fw flowid 10:2
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 103 fw flowid 10:3
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 104 fw flowid 10:4
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 105 fw flowid 10:5


 # Low Priority Queues
 tc class add dev ppp0 parent 10: classid 10:6  cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded
 tc class add dev ppp0 parent 10: classid 10:7  cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded
 tc class add dev ppp0 parent 10: classid 10:8  cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded
 tc class add dev ppp0 parent 10: classid 10:9  cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded
 tc class add dev ppp0 parent 10: classid 10:10 cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded

 # Add leaf qdiscs to these queues
 tc qdisc add dev ppp0 parent 10:6 handle  106: tbf rate 96kbit buffer 16384 limit 16384 mpu 200
 tc qdisc add dev ppp0 parent 10:7 handle  107: tbf rate 96kbit buffer 16384 limit 16384 mpu 200
 tc qdisc add dev ppp0 parent 10:8 handle  108: tbf rate 96kbit buffer 16384 limit 16384 mpu 200
 tc qdisc add dev ppp0 parent 10:9 handle  109: tbf rate 96kbit buffer 16384 limit 16384 mpu 200
 tc qdisc add dev ppp0 parent 10:10 handle 110: tbf rate 96kbit buffer 16384 limit 16384 mpu 200

 # Add filters to match FWMARK packets and sort into these queues
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 106 fw flowid 10:6
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 107 fw flowid 10:7
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 108 fw flowid 10:8
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 109 fw flowid 10:9
 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 110 fw flowid 10:10

Now, all that is needed is to use iptables to MARK packets for the appropriate queues. Remember MARK does not immediately RETURN the packet. I get around this by marking every packet first, then having the sorting rules match on that mark, so once a packet is marked, it won't be marked again in that table.

Heres what my PREROUTING chain looks like.

 Chain PREROUTING (policy ACCEPT 294 packets, 114705 bytes)
   pkts      bytes target     prot opt in     out     source               destination
      716   207616 MARK       all  --  eth0   *       0.0.0.0/0            0.0.0.0/0           MARK set 0x80
       18     1080 MARK       icmp --  etho   *       10.1.13.3            0.0.0.0/0           MARK match 0x80 MARK set 0x65
        0        0 MARK       icmp --  eth0   *       10.1.13.7            0.0.0.0/0           MARK match 0x80 MARK set 0x66
        5      420 MARK       icmp --  eth0   *       0.0.0.0/0            0.0.0.0/0           MARK match 0x80 MARK set 0x69
        5      320 MARK       udp  --  eth0   *       10.1.13.3            0.0.0.0/0           MARK match 0x80 MARK set 0x65
        0        0 MARK       udp  --  eth0   *       10.1.13.7            0.0.0.0/0           MARK match 0x80 MARK set 0x66
       52     6074 MARK       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0           MARK match 0x80 MARK set 0x69
      159     7296 MARK       all  --  eth0   *       10.1.13.3            0.0.0.0/0           MARK match 0x80 MARK set 0x6a
      435   189427 MARK       all  --  eth0   *       10.1.13.7            0.0.0.0/0           MARK match 0x80 MARK set 0x6b
       42     2999 MARK       all  --  eth0   *       0.0.0.0/0            0.0.0.0/0           MARK match 0x80 MARK set 0x6e

Cross-Referencing with the tc filters, we see ..

 root@sleepy:# tc filter show dev ppp0
 filter parent 10: protocol ip pref 1 fw
 filter parent 10: protocol ip pref 1 fw handle 0x65 classid 10:1
 filter parent 10: protocol ip pref 1 fw handle 0x66 classid 10:2
 filter parent 10: protocol ip pref 1 fw handle 0x67 classid 10:3
 filter parent 10: protocol ip pref 1 fw handle 0x68 classid 10:4
 filter parent 10: protocol ip pref 1 fw handle 0x69 classid 10:5
 filter parent 10: protocol ip pref 1 fw handle 0x6a classid 10:6
 filter parent 10: protocol ip pref 1 fw handle 0x6b classid 10:7
 filter parent 10: protocol ip pref 1 fw handle 0x6c classid 10:8
 filter parent 10: protocol ip pref 1 fw handle 0x6d classid 10:9
 filter parent 10: protocol ip pref 1 fw handle 0x6e classid 10:10

For example, a udp packet arriving on eth0 with source address 10.1.13.3 will be market with 0x65, and sent to class 10:1, which has a SFQ qdisc on the child class of CBQ, while a TCP packet from 10.1.13.7 will be marked with 0x6b and sent to class 10:7 which has a token bucken qdisc on a child class of a CBQ.

 root@sleepy:# tc -s -d class show dev ppp0
 ....
 class cbq 10:1 parent 10: leaf 101: rate 127Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b
 level 0 ewma 5 avpkt 200b maxidle 10898us
  Sent 23370 bytes 383 pkts (dropped 0, overlimits 0)
   borrowed 0 overactions 0 avgidle 285710 undertime 0
 ...
 class cbq 10:7 parent 10: leaf 107: rate 120Kbit cell 8b (bounded) prio no-transmit/8 weight 5Kbit allot 2250b
 level 0 ewma 5 avpkt 1500b maxidle 86516us
  Sent 4149672 bytes 6294 pkts (dropped 341, overlimits 0)
  backlog 11p
 ....

 root@sleepy:# tc -s -d qdisc show dev ppp0
 ...
 qdisc tbf 107: rate 96Kbit burst 16Kb/8 mpu 200b lat 1us
  Sent 4149672 bytes 6294 pkts (dropped 341, overlimits 35289)
 ...
 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024
  Sent 23370 bytes 383 pkts (dropped 0, overlimits 0)
 ...

What remains to do is setup the egress on the internal interface, eth0, and FWMARK packets coming in on the external interface, so we can remark them into the queues.

If you're still confused, this looks like a rather well commented QoS script: http://archives.seul.org/or/talk/May-2005/msg00066.html

A rather outdated but still useful page showing the various disciplines http://www.opalsoft.net/qos/DS-21.htm


See also TrafficShaping