Penguin
Annotated edit history of tc-cbq(8) version 4, including all changes. View license author blame.
Rev Author # Line
1 perry 1 CBQ
2 !!!CBQ
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 SHAPING ALGORITHM
7 CLASSIFICATION
8 LINK SHARING ALGORITHM
9 QDISC
10 CLASSES
11 BUGS
12 SOURCES
13 SEE ALSO
14 AUTHOR
15 ----
16 !!NAME
17
18
19 CBQ - Class Based Queueing
20 !!SYNOPSIS
21
22
23 __tc qdisc ... dev__ dev __( parent__ classid __|
24 root) [[ handle__ major: __] cbq [[ allot__ bytes __]
25 avpkt__ bytes __bandwidth__ rate __[[ cell__ bytes
26 __] [[ ewma__ log __] [[ mpu__ bytes
27 __]__
28
29
30 __tc class ... dev__ dev __parent__ major:[[minor] __[[
31 classid__ major:minor __] cbq allot__ bytes __[[
32 bandwidth__ rate __] [[ rate__ rate __] prio__
33 priority __[[ weight__ weight __] [[ minburst__ packets
34 __] [[ maxburst__ packets __] [[ ewma__ log __] [[
35 cell__ bytes __] avpkt__ bytes __[[ mpu__ bytes __]
36 [[ bounded isolated ] [[ split__ handle ____
37 defmap __] [[ estimator__ interval timeconstant
38 __]__
39 !!DESCRIPTION
40
41
42 Class Based Queueing is a classful qdisc that implements a
43 rich linksharing hierarchy of classes. It contains shaping
44 elements as well as prioritizing capabilities. Shaping is
45 performed using link idle time calculations based on the
46 timing of dequeue events and underlying link
47 bandwidth.
48 !!SHAPING ALGORITHM
49
50
51 When shaping a 10mbit/s connection to 1mbit/s, the link will
52 be idle 90% of the time. If it isn't, it needs to be
53 throttled so that it IS idle 90% of the time.
54
55
56 During operations, the effective idletime is measured using
57 an exponential weighted moving average (EWMA), which
58 considers recent packets to be exponentially more important
59 than past ones. The Unix loadaverage is calculated in the
60 same way.
61
62
63 The calculated idle time is subtracted from the EWMA
64 measured one, the resulting number is called 'avgidle'. A
65 perfectly loaded link has an avgidle of zero: packets arrive
66 exactly at the calculated interval.
67
68
69 An overloaded link has a negative avgidle and if it gets too
70 negative, CBQ throttles and is then
71 'overlimit'.
72
73
74 Conversely, an idle link might amass a huge avgidle, which
75 would then allow infinite bandwidths after a few hours of
76 silence. To prevent this, avgidle is capped at
77 __maxidle.__
78
79
80 If overlimit, in theory, the CBQ could throttle itself for
81 exactly the amount of time that was calculated to pass
82 between packets, and then pass one packet, and throttle
83 again. Due to timer resolution constraints, this may not be
84 feasible, see the __minburst__ parameter
85 below.
86 !!CLASSIFICATION
87
88
89 Within the one CBQ instance many classes may exist. Each of
90 these classes contains another qdisc, by default
4 perry 91 tc-pfifo(8).
1 perry 92
93
94 When enqueueing a packet, CBQ starts at the root and uses
95 various methods to determine which class should receive the
96 data.
97
98
99 In the absence of uncommon configuration options, the
100 process is rather easy. At each node we look for an
101 instruction, and then go to the class the instruction refers
102 us to. If the class found is a barren leaf-node (without
103 children), we enqueue the packet there. If it is not yet a
104 leaf node, we do the whole thing over again starting from
105 that node.
106
107
108 The following actions are performed, in order at each node
109 we visit, until one sends us to another node, or terminates
110 the process.
111
112
113 (i)
114
115
116 Consult filters attached to the class. If sent to a
117 leafnode, we are done. Otherwise, restart.
118
119
120 (ii)
121
122
123 Consult the defmap for the priority assigned to this packet,
124 which depends on the TOS bits. Check if the referral is
125 leafless, otherwise restart.
126
127
128 (iii)
129
130
131 Ask the defmap for instructions for the 'best effort'
132 priority. Check the answer for leafness, otherwise
133 restart.
134
135
136 (iv)
137
138
139 If none of the above returned with an instruction, enqueue
140 at this node.
141
142
143 This algorithm makes sure that a packet always ends up
144 somewhere, even while you are busy building your
145 configuration.
146
147
148 For more details, see __tc-cbq-details(8).__
149 !!LINK SHARING ALGORITHM
150
151
152 When dequeuing for sending to the network device, CBQ
153 decides which of its classes will be allowed to send. It
154 does so with a Weighted Round Robin process in which each
155 class with packets gets a chance to send in turn. The WRR
156 process starts by asking the highest priority classes
157 (lowest numerically - highest semantically) for packets, and
158 will continue to do so until they have no more data to
159 offer, in which case the process repeats for lower
160 priorities.
161
162
163 Classes by default borrow bandwidth from their siblings. A
164 class can be prevented from doing so by declaring it
165 'bounded'. A class can also indicate its unwillingness to
166 lend out bandwidth by being 'isolated'.
167 !!QDISC
168
169
170 The root of a CBQ qdisc class tree has the following
171 parameters:
172
173
174 parent major:minor | root
175
176
177 This mandatory parameter determines the place of the CBQ
178 instance, either at the __root__ of an interface or
179 within an existing class.
180
181
182 handle major:
183
184
185 Like all other qdiscs, the CBQ can be assigned a handle.
186 Should consist only of a major number, followed by a colon.
187 Optional, but very useful if classes will be generated
188 within this qdisc.
189
190
191 allot bytes
192
193
194 This allotment is the 'chunkiness' of link sharing and is
195 used for determining packet transmission time tables. The
196 qdisc allot differs slightly from the class allot discussed
197 below. Optional. Defaults to a reasonable value, related to
198 avpkt.
199
200
201 avpkt bytes
202
203
204 The average size of a packet is needed for calculating
205 maxidle, and is also used for making sure 'allot' has a safe
206 value. Mandatory.
207
208
209 bandwidth rate
210
211
212 To determine the idle time, CBQ must know the bandwidth of
213 your underlying physical interface, or parent qdisc. This is
214 a vital parameter, more about it later.
215 Mandatory.
216
217
218 cell
219
220
221 The cell size determines he granularity of packet
222 transmission time calculations. Has a sensible
223 default.
224
225
226 mpu
227
228
229 A zero sized packet may still take time to transmit. This
230 value is the lower cap for packet transmission time
231 calculations - packets smaller than this value are still
232 deemed to have this size. Defaults to zero.
233
234
235 ewma log
236
237
238 When CBQ needs to measure the average idle time, it does so
239 using an Exponentially Weighted Moving Average which
240 smoothes out measurements into a moving average. The EWMA
241 LOG determines how much smoothing occurs. Lower values imply
242 greater sensitivity. Must be between 0 and 31. Defaults to
243 5.
244
245
246 A CBQ qdisc does not shape out of its own accord. It only
247 needs to know certain parameters about the underlying link.
248 Actual shaping is done in classes.
249 !!CLASSES
250
251
252 Classes have a host of parameters to configure their
253 operation.
254
255
256 parent major:minor
257
258
259 Place of this class within the hierarchy. If attached
260 directly to a qdisc and not to another class, minor can be
261 omitted. Mandatory.
262
263
264 classid major:minor
265
266
267 Like qdiscs, classes can be named. The major number must be
268 equal to the major number of the qdisc to which it belongs.
269 Optional, but needed if this class is going to have
270 children.
271
272
273 weight weight
274
275
276 When dequeuing to the interface, classes are tried for
277 traffic in a round-robin fashion. Classes with a higher
278 configured qdisc will generally have more traffic to offer
279 during each round, so it makes sense to allow it to dequeue
280 more traffic. All weights under a class are normalized, so
281 only the ratios matter. Defaults to the configured rate,
282 unless the priority of this class is maximal, in which case
283 it is set to 1.
284
285
286 allot bytes
287
288
289 Allot specifies how many bytes a qdisc can dequeue during
290 each round of the process. This parameter is weighted using
291 the renormalized class weight described above. Silently
292 capped at a minimum of 3/2 avpkt. Mandatory.
293
294
295 prio priority
296
297
298 In the round-robin process, classes with the lowest priority
299 field are tried for packets first. Mandatory.
300
301
302 avpkt
303
304
305 See the QDISC section.
306
307
308 rate rate
309
310
311 Maximum rate this class and all its children combined can
312 send at. Mandatory.
313
314
315 bandwidth rate
316
317
318 This is different from the bandwidth specified when creating
319 a CBQ disc! Only used to determine maxidle and offtime,
320 which are only calculated when specifying maxburst or
321 minburst. Mandatory if specifying maxburst or
322 minburst.
323
324
325 maxburst
326
327
328 This number of packets is used to calculate maxidle so that
329 when avgidle is at maxidle, this number of average packets
330 can be burst before avgidle drops to 0. Set it higher to be
331 more tolerant of bursts. You can't set maxidle directly,
332 only via this parameter.
333
334
335 minburst
336
337
338 As mentioned before, CBQ needs to throttle in case of
339 overlimit. The ideal solution is to do so for exactly the
340 calculated idle time, and pass 1 packet. However, Unix
341 kernels generally have a hard time scheduling events shorter
342 than 10ms, so it is better to throttle for a longer period,
343 and then pass minburst packets in one go, and then sleep
344 minburst times longer.
345
346
347 The time to wait is called the offtime. Higher values of
348 minburst lead to more accurate shaping in the long term, but
349 to bigger bursts at millisecond timescales.
350 Optional.
351
352
353 minidle
354
355
356 If avgidle is below 0, we are overlimits and need to wait
357 until avgidle will be big enough to send one packet. To
358 prevent a sudden burst from shutting down the link for a
359 prolonged period of time, avgidle is reset to minidle if it
360 gets too low.
361
362
363 Minidle is specified in negative microseconds, so 10 means
364 that avgidle is capped at -10us. Optional.
365
366
367 bounded
368
369
370 Signifies that this class will not borrow bandwidth from its
371 siblings.
372
373
374 isolated
375
376
377 Means that this class will not borrow bandwidth to its
378 siblings
379
380
381 split major:minor
382
383
384 If consulting filters attached to a class did not give a
385 verdict, CBQ can also classify based on the packet's
386 priority. There are 16 priorities available, numbered from 0
387 to 15.
388
389
390 The defmap specifies which priorities this class wants to
391 receive, specified as a bitmap. The Least Significant Bit
392 corresponds to priority zero. The __split__ parameter
393 tells CBQ at which class the decision must be made, which
394 should be a (grand)parent of the class you are
395 adding.
396
397
398 As an example, 'tc class add ... classid 10:1 cbq .. split
399 10:0 defmap c0' configures class 10:0 to send packets with
400 priorities 6 and 7 to 10:1.
401
402
403 The complimentary configuration would then be: 'tc class add
404 ... classid 10:2 cbq ... split 10:0 defmap 3f' Which would
405 send all packets 0, 1, 2, 3, 4 and 5 to 10:1.
406
407
408 estimator interval timeconstant
409
410
411 CBQ can measure how much bandwidth each class is using,
412 which tc filters can use to classify packets with. In order
413 to determine the bandwidth it uses a very simple estimator
414 that measures once every __interval__ microseconds how
415 much traffic has passed. This again is a EWMA, for which the
416 time constant can be specified, also in microseconds. The
417 __time constant__ corresponds to the sluggishness of the
418 measurement or, conversely, to the sensitivity of the
419 average to short bursts. Higher values mean less
420 sensitivity.
421 !!BUGS
422
423
424 The actual bandwidth of the underlying link may not be
425 known, for example in the case of PPoE or PPTP connections
426 which in fact may send over a pipe, instead of over a
427 physical device. CBQ is quite resilient to major errors in
428 the configured bandwidth, probably a the cost of coarser
429 shaping.
430
431
432 Default kernels rely on coarse timing information for making
433 decisions. These may make shaping precise in the long term,
434 but inaccurate on second long scales.
435
436
437 See tc-cbq-details(8) for hints on how to improve
438 this.
439 !!SOURCES
440 * Sally Floyd and Van Jacobson,
441 * Sally Floyd,
442 * Sally Floyd,
443 * Sally Floyd and Michael Speer,
444 !!SEE ALSO
445
446
447 tc(8)
448 !!AUTHOR
449
450
451 Alexey N. Kuznetsov,
452 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.