Before you start, if you want an authoritive guide to the following material, read The Linux Advanced Routing and Traffic Control HOWTO instead.
This is a brief but concerted effort to minimise the effect of TCP traffic on interactive applications, using QoS under Linux.
The major motivating factor behind this work was the effect the new queuing imposed before the bandwidth throttling at the RAN, on Telecom's network in early 2004.
I live in a flat where we pay 3rds of the cost of the internet bill, so between three people, they each should have an even crack at the bandwidth.
I devised two queuing structures that should give a decent balance between users, one for egress on the internal interface and one for egress on the external interface.
Egress on internal interface (eth0). Root Class Qdisc 1: PRIO ---------------------------------------------- | Class 1:1: PRIO Queue priority 1 | | Qdisc 10: CBQ 128kbit | | ------------------------------------------ | | | Class 10:1(CBQ Class, 128kbit) | | | | Qdisc 101: SFQ | | | | -------------------------------------- | | | | | High Priority Per-User Queue | | | | | | For interactive applications | | | | | | and ICMP etc. | | | | | -------------------------------------- | | | | Class 10:2(CBQ Class, 128kbit) | | | | Qdisc 102: SFQ | | | | -------------------------------------- | | | | | High Priority Per-User Queue | | | | | | For interactive applications | | | | | | and ICMP etc. | | | | | -------------------------------------- | | | | Class 10:x(CBQ Class, 128kbit) | | | | Qdisc 10x: SFQ | | | | -------------------------------------- | | | | | High Priority Per-User Queue | | | | | | For interactive applications | | | | | | and ICMP etc. | | | | | -------------------------------------- | | | | Class 10:5(CBQ Class, 120kbit) | | | | Qdisc 105: TBQ (Rate 96kbit) | | | | -------------------------------------- | | | | | Low Priority Per-User Queue | | | Routing -> | | | For TCP and other non-interactive | | | -> Network card | | | applications. | | | | | -------------------------------------- | | | | Class 10:6(CBQ Class, 120kbit) | | | | Qdisc 106: TBQ (Rate 96kbit) | | | | -------------------------------------- | | | | | Low Priority Per-User Queue | | | | | | For TCP and other non-interactive | | | | | | applications. | | | | | -------------------------------------- | | | | Class 10:x(CBQ Class, 120kbit) | | | | Qdisc 10x: TBQ (Rate 96kbit) | | | | -------------------------------------- | | | | | Low Priority Per-User Queue | | | | | | For TCP and other non-interactive | | | | | | applications. | | | | | -------------------------------------- | | | ------------------------------------------ | | Class 1:2 (PRIO band priority 2) | | Qdisc 20: SFQ | | ------------------------------------------ | | | Bypass Queue for internal | | | | Traffic | | | ------------------------------------------ | | | ---------------------------------------------- Egress on external interface (ppp0). Root Class Qdisc 10: CBQ 128kbit ------------------------------------------ | Class 10:1(CBQ Class, 128kbit) | | Qdisc 101: SFQ | | -------------------------------------- | | | High Priority Per-User Queue | | | | For interactive applications | | | | and ICMP etc. | | | -------------------------------------- | | Class 10:2(CBQ Class, 128kbit) | | Qdisc 102: SFQ | | -------------------------------------- | | | High Priority Per-User Queue | | | | For interactive applications | | | | and ICMP etc. | | | -------------------------------------- | | Class 10:x(CBQ Class, 128kbit) | | Qdisc 10x: SFQ | | -------------------------------------- | | | High Priority Per-User Queue | | | | For interactive applications | | | | and ICMP etc. | | | -------------------------------------- | | Class 10:5(CBQ Class, 120kbit) | | Qdisc 105: TBQ (Rate 96kbit) | | -------------------------------------- | | | Low Priority Per-User Queue | | Routing -> | | For TCP and other non-interactive | | -> ADSL modem | | applications. | | | -------------------------------------- | | Class 10:6(CBQ Class, 120kbit) | | Qdisc 106: TBQ (Rate 96kbit) | | -------------------------------------- | | | Low Priority Per-User Queue | | | | For TCP and other non-interactive | | | | applications. | | | -------------------------------------- | | Class 10:x(CBQ Class, 120kbit) | | Qdisc 10x: TBQ (Rate 96kbit) | | -------------------------------------- | | | Low Priority Per-User Queue | | | | For TCP and other non-interactive | | | | applications. | | | -------------------------------------- | ------------------------------------------
Linux QoS is based around a heirachical structure of queueing disciples (qdiscs), classes and filters.
There are two major types of qdiscs, those that are classful, or contain classes, and those that are classless, or leaf qdiscs.
Examples of Classless qdiscs
Examples of Classful qdiscs
Of these classful qdiscs, some such as the PRIO class automagically create classes under the qdisc. Others, such as CBQ require you to explicitly create the classes inside the qdisc if you wish to apply a qdisc to that class.
You can only apply a qdisc to a class, which is why classless qdiscs are therefore leaf nodes of the 'QoS tree' so to speak.
With this in mind, let us first create the QoS rules on the ppp0 interface.
First, lets delete everything and start on a clean slate.
root@sleepy:# tc qdisc del dev ppp0 root root@sleepy:# tc qdisc show dev ppp0
Ok, now we need to add the qdisc to the device root class, we're using CBQ
root@sleepy:# tc qdisc add dev ppp0 root handle 10: cbq bandwidth 6Mbit rate 127kbit avpkt 1000 root@sleepy:# tc qdisc show dev ppp0 qdisc cbq 10: rate 127Kbit (bounded,isolated) prio no-transmit root@sleepy:# tc class show dev ppp0 class cbq 10: root rate 127Kbit (bounded,isolated) prio no-transmit
Notice how this has created a root class, we could refer to it as 10:0 if we desired. Lets create the first of our High priority queues.
root@sleepy:# tc class add dev ppp0 parent 10: classid 10:1 cbq bandwidth 6Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded root@sleepy:# tc class show dev ppp0 class cbq 10: root rate 127Kbit (bounded,isolated) prio no-transmit class cbq 10:1 parent 10: rate 127Kbit (bounded) prio no-transmit
And apply the SFQ to traffic in that queue!
root@sleepy:# tc qdisc show dev ppp0 qdisc sfq 101: quantum 1500b qdisc cbq 10: rate 127Kbit (bounded,isolated) prio no-transmit
This is what the QoS looks like now.
Root Class Qdisc 10: CBQ 127kbit ------------------------------------------ | Class 10:1(CBQ Class, 127kbit) | | Qdisc 101: SFQ | | -------------------------------------- | | | High Priority Per-User Queue | | | | For interactive applications | | | | and ICMP etc. | | | -------------------------------------- | ------------------------------------------
Lets have a look at whats happening with traffic so far:
root@sleepy:# tc -d -s qdisc show dev ppp0 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc cbq 10: rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b level 1 ewma 5 avpkt 1000b maxidle 1921us Sent 7672003 bytes 14328 pkts (dropped 70, overlimits 0) borrowed 0 overactions 0 avgidle 50393 undertime 0 root@sleepy:# tc -d -s class show dev ppp0 class cbq 10: root rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b level 1 ewma 5 avpkt 1000b maxidle 1921us Sent 7713158 bytes 14548 pkts (dropped 70, overlimits 0) borrowed 0 overactions 0 avgidle 48819 undertime 0 class cbq 10:1 parent 10: leaf 101: rate 128Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b level 0 ewma 5 avpkt 200b maxidle 10812us Sent 0 bytes 0 pkts (dropped 0, overlimits 0) borrowed 0 overactions 0 avgidle 283475 undertime 0
Well as we can see, things appear to be working, at least to a certain point.
All traffic is being queued by the root class, and not being passed to the SFQ qdisc inside class 10:1.
Now we need to use the third major building block of Linux QoS, the filter.
We need to apply a filter, to the root qdisc 10: to move certian traffic to the 10:1 class.
Linux iptables comes with a feature called fwmark, which allows packets to be tagged as they traverse firewall rules. This can be useful for tagging packets for QoS based on PROTO/IP/PORT information. The iptables MARK target, can be applied to the PREROUTING or POSTROUTING chains on any interface. Because however, I use IP MASQUERADING on my ppp0 interface, the internal IP of the source host sending data to the internet via ADSL has been lost in the ppp0 POSTROUTING chain.
Therefore, we will use the PREROUTING chain on the eth0 interface to mark the packets.
root@sleepy:# iptables -t mangle -I PREROUTING -j MARK --set-mark 100 -s 10.1.13.3/32 -i eth0 root@sleepy:# iptables -L -n -t mangle -x -v Chain PREROUTING (policy ACCEPT 367 packets, 125801 bytes) pkts bytes target prot opt in out source destination 32 2000 MARK all -- eth0 * 10.1.13.3 0.0.0.0/0 MARK set 0x64
We are now marking packets with 100, or hex 0x64.
All that remains to get packets matching the SFQ, is to add the filter with tc:
And the results...
Before:
root@sleepy:# tc -s -d qdisc show dev ppp0 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc cbq 10: rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b level 1 ewma 5 avpkt 1000b maxidle 1921us Sent 22225127 bytes 39734 pkts (dropped 103, overlimits 0) borrowed 0 overactions 0 avgidle 50393 undertime 0 root@sleepy:# tc -s -d class show dev ppp0 class cbq 10: root rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b level 1 ewma 5 avpkt 1000b maxidle 1921us Sent 22206812 bytes 39626 pkts (dropped 103, overlimits 0) borrowed 0 overactions 0 avgidle 50393 undertime 0 class cbq 10:1 parent 10: leaf 101: rate 128Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b level 0 ewma 5 avpkt 200b maxidle 10812us Sent 0 bytes 0 pkts (dropped 0, overlimits 0) borrowed 0 overactions 0 avgidle 283475 undertime 0
Adding Filter:
root@sleepy:# tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 100 fw flowid 10:1 root@sleepy:# tc filter show dev ppp0 filter parent 10: protocol ip pref 1 fw filter parent 10: protocol ip pref 1 fw handle 0x64 classid 10:1
After:
root@sleepy:# tc -s -d qdisc show dev ppp0 qdisc sfq 101: quantum 1500b limit 128p flows 128/1024 Sent 548 bytes 9 pkts (dropped 0, overlimits 0) qdisc cbq 10: rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b level 1 ewma 5 avpkt 1000b maxidle 1921us Sent 22437536 bytes 40128 pkts (dropped 103, overlimits 0) borrowed 0 overactions 0 avgidle 50393 undertime 0 root@sleepy:# tc -s -d class show dev ppp0 class cbq 10: root rate 127Kbit cell 8b (bounded,isolated) prio no-transmit/8 weight 127Kbit allot 1500b level 1 ewma 5 avpkt 1000b maxidle 1921us Sent 22308840 bytes 39979 pkts (dropped 103, overlimits 0) borrowed 0 overactions 0 avgidle 50393 undertime 0 class cbq 10:1 parent 10: leaf 101: rate 128Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b level 0 ewma 5 avpkt 200b maxidle 10812us Sent 244 bytes 4 pkts (dropped 0, overlimits 0) borrowed 0 overactions 0 avgidle 283475 undertime 0
Alright, now we're getting some traffic in the SFQ now!
Adding the rest of the queues is relatively straight forward, providing you use a sane numbering scheme for your qdisc/class handles.
Here is the script to setup the QoS structure for ADSL as above, we're adding 5 user queues, for expansions sake, if they're not used, they don't affect anyone else.
# Clear the slate tc qdisc del dev ppp0 root # Add a classful qdisc (cbq) to the root class tc qdisc add dev ppp0 root handle 10: cbq bandwidth 6Mbit rate 127kbit avpkt 1000 # High priority Queues # Add the 5 High Priority classes under the CBQ Qdisc tc class add dev ppp0 parent 10: classid 10:1 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded tc class add dev ppp0 parent 10: classid 10:2 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded tc class add dev ppp0 parent 10: classid 10:3 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded tc class add dev ppp0 parent 10: classid 10:4 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded tc class add dev ppp0 parent 10: classid 10:5 cbq bandwidth 100Mbit rate 127kbit weight 20kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 200 bounded # Add leaf qdiscs to these queues tc qdisc add dev ppp0 parent 10:1 handle 101: sfq tc qdisc add dev ppp0 parent 10:2 handle 102: sfq tc qdisc add dev ppp0 parent 10:3 handle 103: sfq tc qdisc add dev ppp0 parent 10:4 handle 104: sfq tc qdisc add dev ppp0 parent 10:5 handle 105: sfq # Add filters to match FWMARK packets and sort into these queues tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 101 fw flowid 10:1 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 102 fw flowid 10:2 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 103 fw flowid 10:3 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 104 fw flowid 10:4 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 105 fw flowid 10:5 # Low Priority Queues tc class add dev ppp0 parent 10: classid 10:6 cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded tc class add dev ppp0 parent 10: classid 10:7 cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded tc class add dev ppp0 parent 10: classid 10:8 cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded tc class add dev ppp0 parent 10: classid 10:9 cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded tc class add dev ppp0 parent 10: classid 10:10 cbq bandwidth 100Mbit rate 120kbit weight 5kbit prio 8 allot 1514 cell 8 maxburst 20 avpkt 1500 bounded # Add leaf qdiscs to these queues tc qdisc add dev ppp0 parent 10:6 handle 106: tbf rate 96kbit buffer 16384 limit 16384 mpu 200 tc qdisc add dev ppp0 parent 10:7 handle 107: tbf rate 96kbit buffer 16384 limit 16384 mpu 200 tc qdisc add dev ppp0 parent 10:8 handle 108: tbf rate 96kbit buffer 16384 limit 16384 mpu 200 tc qdisc add dev ppp0 parent 10:9 handle 109: tbf rate 96kbit buffer 16384 limit 16384 mpu 200 tc qdisc add dev ppp0 parent 10:10 handle 110: tbf rate 96kbit buffer 16384 limit 16384 mpu 200 # Add filters to match FWMARK packets and sort into these queues tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 106 fw flowid 10:6 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 107 fw flowid 10:7 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 108 fw flowid 10:8 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 109 fw flowid 10:9 tc filter add dev ppp0 protocol ip parent 10: prio 1 handle 110 fw flowid 10:10
Now, all that is needed is to use iptables to MARK packets for the appropriate queues. Remember MARK does not immediately RETURN the packet. I get around this by marking every packet first, then having the sorting rules match on that mark, so once a packet is marked, it won't be marked again in that table.
Heres what my PREROUTING chain looks like.
Chain PREROUTING (policy ACCEPT 294 packets, 114705 bytes) pkts bytes target prot opt in out source destination 716 207616 MARK all -- eth0 * 0.0.0.0/0 0.0.0.0/0 MARK set 0x80 18 1080 MARK icmp -- etho * 10.1.13.3 0.0.0.0/0 MARK match 0x80 MARK set 0x65 0 0 MARK icmp -- eth0 * 10.1.13.7 0.0.0.0/0 MARK match 0x80 MARK set 0x66 5 420 MARK icmp -- eth0 * 0.0.0.0/0 0.0.0.0/0 MARK match 0x80 MARK set 0x69 5 320 MARK udp -- eth0 * 10.1.13.3 0.0.0.0/0 MARK match 0x80 MARK set 0x65 0 0 MARK udp -- eth0 * 10.1.13.7 0.0.0.0/0 MARK match 0x80 MARK set 0x66 52 6074 MARK udp -- eth0 * 0.0.0.0/0 0.0.0.0/0 MARK match 0x80 MARK set 0x69 159 7296 MARK all -- eth0 * 10.1.13.3 0.0.0.0/0 MARK match 0x80 MARK set 0x6a 435 189427 MARK all -- eth0 * 10.1.13.7 0.0.0.0/0 MARK match 0x80 MARK set 0x6b 42 2999 MARK all -- eth0 * 0.0.0.0/0 0.0.0.0/0 MARK match 0x80 MARK set 0x6e
Cross-Referencing with the tc filters, we see ..
root@sleepy:# tc filter show dev ppp0 filter parent 10: protocol ip pref 1 fw filter parent 10: protocol ip pref 1 fw handle 0x65 classid 10:1 filter parent 10: protocol ip pref 1 fw handle 0x66 classid 10:2 filter parent 10: protocol ip pref 1 fw handle 0x67 classid 10:3 filter parent 10: protocol ip pref 1 fw handle 0x68 classid 10:4 filter parent 10: protocol ip pref 1 fw handle 0x69 classid 10:5 filter parent 10: protocol ip pref 1 fw handle 0x6a classid 10:6 filter parent 10: protocol ip pref 1 fw handle 0x6b classid 10:7 filter parent 10: protocol ip pref 1 fw handle 0x6c classid 10:8 filter parent 10: protocol ip pref 1 fw handle 0x6d classid 10:9 filter parent 10: protocol ip pref 1 fw handle 0x6e classid 10:10
For example, a udp packet arriving on eth0 with source address 10.1.13.3 will be market with 0x65, and sent to class 10:1, which has a SFQ qdisc on the child class of CBQ, while a TCP packet from 10.1.13.7 will be marked with 0x6b and sent to class 10:7 which has a token bucken qdisc on a child class of a CBQ.
root@sleepy:# tc -s -d class show dev ppp0 .... class cbq 10:1 parent 10: leaf 101: rate 127Kbit cell 8b (bounded) prio no-transmit/8 weight 20Kbit allot 1514b level 0 ewma 5 avpkt 200b maxidle 10898us Sent 23370 bytes 383 pkts (dropped 0, overlimits 0) borrowed 0 overactions 0 avgidle 285710 undertime 0 ... class cbq 10:7 parent 10: leaf 107: rate 120Kbit cell 8b (bounded) prio no-transmit/8 weight 5Kbit allot 2250b level 0 ewma 5 avpkt 1500b maxidle 86516us Sent 4149672 bytes 6294 pkts (dropped 341, overlimits 0) backlog 11p .... root@sleepy:# tc -s -d qdisc show dev ppp0 ... qdisc tbf 107: rate 96Kbit burst 16Kb/8 mpu 200b lat 1us Sent 4149672 bytes 6294 pkts (dropped 341, overlimits 35289) ... qdisc sfq 101: quantum 1500b limit 128p flows 128/1024 Sent 23370 bytes 383 pkts (dropped 0, overlimits 0) ...
What remains to do is setup the egress on the internal interface, eth0, and FWMARK packets coming in on the external interface, so we can remark them into the queues.
If you're still confused, this looks like a rather well commented QoS script: http://archives.seul.org/or/talk/May-2005/msg00066.html
A rather outdated but still useful page showing the various disciplines http://www.opalsoft.net/qos/DS-21.htm
See also TrafficShaping
One page links to LinuxQualityOfService: