version 4, including all changes.
.
Rev |
Author |
# |
Line |
1 |
perry |
1 |
CBQ |
|
|
2 |
!!!CBQ |
|
|
3 |
NAME |
|
|
4 |
SYNOPSIS |
|
|
5 |
DESCRIPTION |
|
|
6 |
SHAPING ALGORITHM |
|
|
7 |
CLASSIFICATION |
|
|
8 |
LINK SHARING ALGORITHM |
|
|
9 |
QDISC |
|
|
10 |
CLASSES |
|
|
11 |
BUGS |
|
|
12 |
SOURCES |
|
|
13 |
SEE ALSO |
|
|
14 |
AUTHOR |
|
|
15 |
---- |
|
|
16 |
!!NAME |
|
|
17 |
|
|
|
18 |
|
|
|
19 |
CBQ - Class Based Queueing |
|
|
20 |
!!SYNOPSIS |
|
|
21 |
|
|
|
22 |
|
|
|
23 |
__tc qdisc ... dev__ dev __( parent__ classid __| |
|
|
24 |
root) [[ handle__ major: __] cbq [[ allot__ bytes __] |
|
|
25 |
avpkt__ bytes __bandwidth__ rate __[[ cell__ bytes |
|
|
26 |
__] [[ ewma__ log __] [[ mpu__ bytes |
|
|
27 |
__]__ |
|
|
28 |
|
|
|
29 |
|
|
|
30 |
__tc class ... dev__ dev __parent__ major:[[minor] __[[ |
|
|
31 |
classid__ major:minor __] cbq allot__ bytes __[[ |
|
|
32 |
bandwidth__ rate __] [[ rate__ rate __] prio__ |
|
|
33 |
priority __[[ weight__ weight __] [[ minburst__ packets |
|
|
34 |
__] [[ maxburst__ packets __] [[ ewma__ log __] [[ |
|
|
35 |
cell__ bytes __] avpkt__ bytes __[[ mpu__ bytes __] |
|
|
36 |
[[ bounded isolated ] [[ split__ handle ____ |
|
|
37 |
defmap __] [[ estimator__ interval timeconstant |
|
|
38 |
__]__ |
|
|
39 |
!!DESCRIPTION |
|
|
40 |
|
|
|
41 |
|
|
|
42 |
Class Based Queueing is a classful qdisc that implements a |
|
|
43 |
rich linksharing hierarchy of classes. It contains shaping |
|
|
44 |
elements as well as prioritizing capabilities. Shaping is |
|
|
45 |
performed using link idle time calculations based on the |
|
|
46 |
timing of dequeue events and underlying link |
|
|
47 |
bandwidth. |
|
|
48 |
!!SHAPING ALGORITHM |
|
|
49 |
|
|
|
50 |
|
|
|
51 |
When shaping a 10mbit/s connection to 1mbit/s, the link will |
|
|
52 |
be idle 90% of the time. If it isn't, it needs to be |
|
|
53 |
throttled so that it IS idle 90% of the time. |
|
|
54 |
|
|
|
55 |
|
|
|
56 |
During operations, the effective idletime is measured using |
|
|
57 |
an exponential weighted moving average (EWMA), which |
|
|
58 |
considers recent packets to be exponentially more important |
|
|
59 |
than past ones. The Unix loadaverage is calculated in the |
|
|
60 |
same way. |
|
|
61 |
|
|
|
62 |
|
|
|
63 |
The calculated idle time is subtracted from the EWMA |
|
|
64 |
measured one, the resulting number is called 'avgidle'. A |
|
|
65 |
perfectly loaded link has an avgidle of zero: packets arrive |
|
|
66 |
exactly at the calculated interval. |
|
|
67 |
|
|
|
68 |
|
|
|
69 |
An overloaded link has a negative avgidle and if it gets too |
|
|
70 |
negative, CBQ throttles and is then |
|
|
71 |
'overlimit'. |
|
|
72 |
|
|
|
73 |
|
|
|
74 |
Conversely, an idle link might amass a huge avgidle, which |
|
|
75 |
would then allow infinite bandwidths after a few hours of |
|
|
76 |
silence. To prevent this, avgidle is capped at |
|
|
77 |
__maxidle.__ |
|
|
78 |
|
|
|
79 |
|
|
|
80 |
If overlimit, in theory, the CBQ could throttle itself for |
|
|
81 |
exactly the amount of time that was calculated to pass |
|
|
82 |
between packets, and then pass one packet, and throttle |
|
|
83 |
again. Due to timer resolution constraints, this may not be |
|
|
84 |
feasible, see the __minburst__ parameter |
|
|
85 |
below. |
|
|
86 |
!!CLASSIFICATION |
|
|
87 |
|
|
|
88 |
|
|
|
89 |
Within the one CBQ instance many classes may exist. Each of |
|
|
90 |
these classes contains another qdisc, by default |
4 |
perry |
91 |
tc-pfifo(8). |
1 |
perry |
92 |
|
|
|
93 |
|
|
|
94 |
When enqueueing a packet, CBQ starts at the root and uses |
|
|
95 |
various methods to determine which class should receive the |
|
|
96 |
data. |
|
|
97 |
|
|
|
98 |
|
|
|
99 |
In the absence of uncommon configuration options, the |
|
|
100 |
process is rather easy. At each node we look for an |
|
|
101 |
instruction, and then go to the class the instruction refers |
|
|
102 |
us to. If the class found is a barren leaf-node (without |
|
|
103 |
children), we enqueue the packet there. If it is not yet a |
|
|
104 |
leaf node, we do the whole thing over again starting from |
|
|
105 |
that node. |
|
|
106 |
|
|
|
107 |
|
|
|
108 |
The following actions are performed, in order at each node |
|
|
109 |
we visit, until one sends us to another node, or terminates |
|
|
110 |
the process. |
|
|
111 |
|
|
|
112 |
|
|
|
113 |
(i) |
|
|
114 |
|
|
|
115 |
|
|
|
116 |
Consult filters attached to the class. If sent to a |
|
|
117 |
leafnode, we are done. Otherwise, restart. |
|
|
118 |
|
|
|
119 |
|
|
|
120 |
(ii) |
|
|
121 |
|
|
|
122 |
|
|
|
123 |
Consult the defmap for the priority assigned to this packet, |
|
|
124 |
which depends on the TOS bits. Check if the referral is |
|
|
125 |
leafless, otherwise restart. |
|
|
126 |
|
|
|
127 |
|
|
|
128 |
(iii) |
|
|
129 |
|
|
|
130 |
|
|
|
131 |
Ask the defmap for instructions for the 'best effort' |
|
|
132 |
priority. Check the answer for leafness, otherwise |
|
|
133 |
restart. |
|
|
134 |
|
|
|
135 |
|
|
|
136 |
(iv) |
|
|
137 |
|
|
|
138 |
|
|
|
139 |
If none of the above returned with an instruction, enqueue |
|
|
140 |
at this node. |
|
|
141 |
|
|
|
142 |
|
|
|
143 |
This algorithm makes sure that a packet always ends up |
|
|
144 |
somewhere, even while you are busy building your |
|
|
145 |
configuration. |
|
|
146 |
|
|
|
147 |
|
|
|
148 |
For more details, see __tc-cbq-details(8).__ |
|
|
149 |
!!LINK SHARING ALGORITHM |
|
|
150 |
|
|
|
151 |
|
|
|
152 |
When dequeuing for sending to the network device, CBQ |
|
|
153 |
decides which of its classes will be allowed to send. It |
|
|
154 |
does so with a Weighted Round Robin process in which each |
|
|
155 |
class with packets gets a chance to send in turn. The WRR |
|
|
156 |
process starts by asking the highest priority classes |
|
|
157 |
(lowest numerically - highest semantically) for packets, and |
|
|
158 |
will continue to do so until they have no more data to |
|
|
159 |
offer, in which case the process repeats for lower |
|
|
160 |
priorities. |
|
|
161 |
|
|
|
162 |
|
|
|
163 |
Classes by default borrow bandwidth from their siblings. A |
|
|
164 |
class can be prevented from doing so by declaring it |
|
|
165 |
'bounded'. A class can also indicate its unwillingness to |
|
|
166 |
lend out bandwidth by being 'isolated'. |
|
|
167 |
!!QDISC |
|
|
168 |
|
|
|
169 |
|
|
|
170 |
The root of a CBQ qdisc class tree has the following |
|
|
171 |
parameters: |
|
|
172 |
|
|
|
173 |
|
|
|
174 |
parent major:minor | root |
|
|
175 |
|
|
|
176 |
|
|
|
177 |
This mandatory parameter determines the place of the CBQ |
|
|
178 |
instance, either at the __root__ of an interface or |
|
|
179 |
within an existing class. |
|
|
180 |
|
|
|
181 |
|
|
|
182 |
handle major: |
|
|
183 |
|
|
|
184 |
|
|
|
185 |
Like all other qdiscs, the CBQ can be assigned a handle. |
|
|
186 |
Should consist only of a major number, followed by a colon. |
|
|
187 |
Optional, but very useful if classes will be generated |
|
|
188 |
within this qdisc. |
|
|
189 |
|
|
|
190 |
|
|
|
191 |
allot bytes |
|
|
192 |
|
|
|
193 |
|
|
|
194 |
This allotment is the 'chunkiness' of link sharing and is |
|
|
195 |
used for determining packet transmission time tables. The |
|
|
196 |
qdisc allot differs slightly from the class allot discussed |
|
|
197 |
below. Optional. Defaults to a reasonable value, related to |
|
|
198 |
avpkt. |
|
|
199 |
|
|
|
200 |
|
|
|
201 |
avpkt bytes |
|
|
202 |
|
|
|
203 |
|
|
|
204 |
The average size of a packet is needed for calculating |
|
|
205 |
maxidle, and is also used for making sure 'allot' has a safe |
|
|
206 |
value. Mandatory. |
|
|
207 |
|
|
|
208 |
|
|
|
209 |
bandwidth rate |
|
|
210 |
|
|
|
211 |
|
|
|
212 |
To determine the idle time, CBQ must know the bandwidth of |
|
|
213 |
your underlying physical interface, or parent qdisc. This is |
|
|
214 |
a vital parameter, more about it later. |
|
|
215 |
Mandatory. |
|
|
216 |
|
|
|
217 |
|
|
|
218 |
cell |
|
|
219 |
|
|
|
220 |
|
|
|
221 |
The cell size determines he granularity of packet |
|
|
222 |
transmission time calculations. Has a sensible |
|
|
223 |
default. |
|
|
224 |
|
|
|
225 |
|
|
|
226 |
mpu |
|
|
227 |
|
|
|
228 |
|
|
|
229 |
A zero sized packet may still take time to transmit. This |
|
|
230 |
value is the lower cap for packet transmission time |
|
|
231 |
calculations - packets smaller than this value are still |
|
|
232 |
deemed to have this size. Defaults to zero. |
|
|
233 |
|
|
|
234 |
|
|
|
235 |
ewma log |
|
|
236 |
|
|
|
237 |
|
|
|
238 |
When CBQ needs to measure the average idle time, it does so |
|
|
239 |
using an Exponentially Weighted Moving Average which |
|
|
240 |
smoothes out measurements into a moving average. The EWMA |
|
|
241 |
LOG determines how much smoothing occurs. Lower values imply |
|
|
242 |
greater sensitivity. Must be between 0 and 31. Defaults to |
|
|
243 |
5. |
|
|
244 |
|
|
|
245 |
|
|
|
246 |
A CBQ qdisc does not shape out of its own accord. It only |
|
|
247 |
needs to know certain parameters about the underlying link. |
|
|
248 |
Actual shaping is done in classes. |
|
|
249 |
!!CLASSES |
|
|
250 |
|
|
|
251 |
|
|
|
252 |
Classes have a host of parameters to configure their |
|
|
253 |
operation. |
|
|
254 |
|
|
|
255 |
|
|
|
256 |
parent major:minor |
|
|
257 |
|
|
|
258 |
|
|
|
259 |
Place of this class within the hierarchy. If attached |
|
|
260 |
directly to a qdisc and not to another class, minor can be |
|
|
261 |
omitted. Mandatory. |
|
|
262 |
|
|
|
263 |
|
|
|
264 |
classid major:minor |
|
|
265 |
|
|
|
266 |
|
|
|
267 |
Like qdiscs, classes can be named. The major number must be |
|
|
268 |
equal to the major number of the qdisc to which it belongs. |
|
|
269 |
Optional, but needed if this class is going to have |
|
|
270 |
children. |
|
|
271 |
|
|
|
272 |
|
|
|
273 |
weight weight |
|
|
274 |
|
|
|
275 |
|
|
|
276 |
When dequeuing to the interface, classes are tried for |
|
|
277 |
traffic in a round-robin fashion. Classes with a higher |
|
|
278 |
configured qdisc will generally have more traffic to offer |
|
|
279 |
during each round, so it makes sense to allow it to dequeue |
|
|
280 |
more traffic. All weights under a class are normalized, so |
|
|
281 |
only the ratios matter. Defaults to the configured rate, |
|
|
282 |
unless the priority of this class is maximal, in which case |
|
|
283 |
it is set to 1. |
|
|
284 |
|
|
|
285 |
|
|
|
286 |
allot bytes |
|
|
287 |
|
|
|
288 |
|
|
|
289 |
Allot specifies how many bytes a qdisc can dequeue during |
|
|
290 |
each round of the process. This parameter is weighted using |
|
|
291 |
the renormalized class weight described above. Silently |
|
|
292 |
capped at a minimum of 3/2 avpkt. Mandatory. |
|
|
293 |
|
|
|
294 |
|
|
|
295 |
prio priority |
|
|
296 |
|
|
|
297 |
|
|
|
298 |
In the round-robin process, classes with the lowest priority |
|
|
299 |
field are tried for packets first. Mandatory. |
|
|
300 |
|
|
|
301 |
|
|
|
302 |
avpkt |
|
|
303 |
|
|
|
304 |
|
|
|
305 |
See the QDISC section. |
|
|
306 |
|
|
|
307 |
|
|
|
308 |
rate rate |
|
|
309 |
|
|
|
310 |
|
|
|
311 |
Maximum rate this class and all its children combined can |
|
|
312 |
send at. Mandatory. |
|
|
313 |
|
|
|
314 |
|
|
|
315 |
bandwidth rate |
|
|
316 |
|
|
|
317 |
|
|
|
318 |
This is different from the bandwidth specified when creating |
|
|
319 |
a CBQ disc! Only used to determine maxidle and offtime, |
|
|
320 |
which are only calculated when specifying maxburst or |
|
|
321 |
minburst. Mandatory if specifying maxburst or |
|
|
322 |
minburst. |
|
|
323 |
|
|
|
324 |
|
|
|
325 |
maxburst |
|
|
326 |
|
|
|
327 |
|
|
|
328 |
This number of packets is used to calculate maxidle so that |
|
|
329 |
when avgidle is at maxidle, this number of average packets |
|
|
330 |
can be burst before avgidle drops to 0. Set it higher to be |
|
|
331 |
more tolerant of bursts. You can't set maxidle directly, |
|
|
332 |
only via this parameter. |
|
|
333 |
|
|
|
334 |
|
|
|
335 |
minburst |
|
|
336 |
|
|
|
337 |
|
|
|
338 |
As mentioned before, CBQ needs to throttle in case of |
|
|
339 |
overlimit. The ideal solution is to do so for exactly the |
|
|
340 |
calculated idle time, and pass 1 packet. However, Unix |
|
|
341 |
kernels generally have a hard time scheduling events shorter |
|
|
342 |
than 10ms, so it is better to throttle for a longer period, |
|
|
343 |
and then pass minburst packets in one go, and then sleep |
|
|
344 |
minburst times longer. |
|
|
345 |
|
|
|
346 |
|
|
|
347 |
The time to wait is called the offtime. Higher values of |
|
|
348 |
minburst lead to more accurate shaping in the long term, but |
|
|
349 |
to bigger bursts at millisecond timescales. |
|
|
350 |
Optional. |
|
|
351 |
|
|
|
352 |
|
|
|
353 |
minidle |
|
|
354 |
|
|
|
355 |
|
|
|
356 |
If avgidle is below 0, we are overlimits and need to wait |
|
|
357 |
until avgidle will be big enough to send one packet. To |
|
|
358 |
prevent a sudden burst from shutting down the link for a |
|
|
359 |
prolonged period of time, avgidle is reset to minidle if it |
|
|
360 |
gets too low. |
|
|
361 |
|
|
|
362 |
|
|
|
363 |
Minidle is specified in negative microseconds, so 10 means |
|
|
364 |
that avgidle is capped at -10us. Optional. |
|
|
365 |
|
|
|
366 |
|
|
|
367 |
bounded |
|
|
368 |
|
|
|
369 |
|
|
|
370 |
Signifies that this class will not borrow bandwidth from its |
|
|
371 |
siblings. |
|
|
372 |
|
|
|
373 |
|
|
|
374 |
isolated |
|
|
375 |
|
|
|
376 |
|
|
|
377 |
Means that this class will not borrow bandwidth to its |
|
|
378 |
siblings |
|
|
379 |
|
|
|
380 |
|
|
|
381 |
split major:minor |
|
|
382 |
|
|
|
383 |
|
|
|
384 |
If consulting filters attached to a class did not give a |
|
|
385 |
verdict, CBQ can also classify based on the packet's |
|
|
386 |
priority. There are 16 priorities available, numbered from 0 |
|
|
387 |
to 15. |
|
|
388 |
|
|
|
389 |
|
|
|
390 |
The defmap specifies which priorities this class wants to |
|
|
391 |
receive, specified as a bitmap. The Least Significant Bit |
|
|
392 |
corresponds to priority zero. The __split__ parameter |
|
|
393 |
tells CBQ at which class the decision must be made, which |
|
|
394 |
should be a (grand)parent of the class you are |
|
|
395 |
adding. |
|
|
396 |
|
|
|
397 |
|
|
|
398 |
As an example, 'tc class add ... classid 10:1 cbq .. split |
|
|
399 |
10:0 defmap c0' configures class 10:0 to send packets with |
|
|
400 |
priorities 6 and 7 to 10:1. |
|
|
401 |
|
|
|
402 |
|
|
|
403 |
The complimentary configuration would then be: 'tc class add |
|
|
404 |
... classid 10:2 cbq ... split 10:0 defmap 3f' Which would |
|
|
405 |
send all packets 0, 1, 2, 3, 4 and 5 to 10:1. |
|
|
406 |
|
|
|
407 |
|
|
|
408 |
estimator interval timeconstant |
|
|
409 |
|
|
|
410 |
|
|
|
411 |
CBQ can measure how much bandwidth each class is using, |
|
|
412 |
which tc filters can use to classify packets with. In order |
|
|
413 |
to determine the bandwidth it uses a very simple estimator |
|
|
414 |
that measures once every __interval__ microseconds how |
|
|
415 |
much traffic has passed. This again is a EWMA, for which the |
|
|
416 |
time constant can be specified, also in microseconds. The |
|
|
417 |
__time constant__ corresponds to the sluggishness of the |
|
|
418 |
measurement or, conversely, to the sensitivity of the |
|
|
419 |
average to short bursts. Higher values mean less |
|
|
420 |
sensitivity. |
|
|
421 |
!!BUGS |
|
|
422 |
|
|
|
423 |
|
|
|
424 |
The actual bandwidth of the underlying link may not be |
|
|
425 |
known, for example in the case of PPoE or PPTP connections |
|
|
426 |
which in fact may send over a pipe, instead of over a |
|
|
427 |
physical device. CBQ is quite resilient to major errors in |
|
|
428 |
the configured bandwidth, probably a the cost of coarser |
|
|
429 |
shaping. |
|
|
430 |
|
|
|
431 |
|
|
|
432 |
Default kernels rely on coarse timing information for making |
|
|
433 |
decisions. These may make shaping precise in the long term, |
|
|
434 |
but inaccurate on second long scales. |
|
|
435 |
|
|
|
436 |
|
|
|
437 |
See tc-cbq-details(8) for hints on how to improve |
|
|
438 |
this. |
|
|
439 |
!!SOURCES |
|
|
440 |
* Sally Floyd and Van Jacobson, |
|
|
441 |
* Sally Floyd, |
|
|
442 |
* Sally Floyd, |
|
|
443 |
* Sally Floyd and Michael Speer, |
|
|
444 |
!!SEE ALSO |
|
|
445 |
|
|
|
446 |
|
|
|
447 |
tc(8) |
|
|
448 |
!!AUTHOR |
|
|
449 |
|
|
|
450 |
|
|
|
451 |
Alexey N. Kuznetsov, |
|
|
452 |
---- |