Penguin
Annotated edit history of sox(1) version 3, including all changes. View license author blame.
Rev Author # Line
3 JohnMcPherson 1 See also our SoxNotes for user-submitted examples/comments.
2 ----
1 perry 3 SoX
4 !!!SoX
5 NAME
6 SYNOPSIS
7 DESCRIPTION
8 OPTIONS
9 FILE TYPES
10 EFFECTS
11 BUGS
12 FILES
13 SEE ALSO
14 NOTICES
15 AUTHOR
16 ----
17 !!NAME
18
19
20 sox - Sound eXchange : universal sound sample translator
21 !!SYNOPSIS
22
23
24 __sox__ ''infile outfile''
25
26
27 __sox__ [[ ''general options'' ] [[ ''format
28 options'' ] ''infile''
29 [[ ''format options'' ] ''outfile''
30 [[ ''effect'' [[ ''effect options'' ] ... ]
31
32
33 __soxmix__ ''infile1 infile2 outfile''
34
35
36 __soxmix__ [[ ''general options'' ] [[ ''format
37 options'' ] ''infile1''
38 [[ ''format options'' ] ''infile2''
39 [[ ''format options'' ] ''outfile''
40 [[ ''effect'' [[ ''effect options'' ] ... ]
41
42
43 __General options:__
44 [[ -h ] [[ -p ] [[ -v ''volume'' ] [[ -V ]
45
46
47 __Format options:__
48 [[ -t ''filetype'' ] [[ -r ''rate'' ] [[
49 -s/-u/-U/-A/-a/-i/-g/-f ] [[ -b/-w/-l ] [[ -c ''channels''
50 ] [[ -x ] [[ -e ]
51
52
53 __Effects:
54 avg__ [[ -l | -r | -f | -b | n,n,...,n ]__
55 band__ [[ -n ] ''center'' [[ ''width'' ]__
56 bandpass__ ''frequency bandwidth''__
57 bandreject__ ''frequency bandwidth''__
58 chorus__ ''gain-in gain out delay decay speed
59 depth''
60 -s | -t [[ ''delay decay speed depth'' -s | -t ]__
61 compand__
62 ''attack1'',''decay1''[[,''attack2'',''decay2''...]''
63 in-dB1'',''out-dB1''[[,''in-dB2'',''out-dB2''...]
64 [[ ''gain'' [[ ''initial-volume'' [[ ''delay'' ] ]
65 ]__
66 copy
67 dcshift__ ''shift'' [[ ''limitergain'' ]__
68 deemph
69 earwax
70 echo__ ''gain-in gain-out delay decay'' [[ ''delay
71 decay ...'' ]__
72 echos__ ''gain-in gain-out delay decay'' [[ ''delay
73 decay ...'' ]__
74 fade__ [[ ''type'' ] ''fade-in-length'' [[
75 ''stop-time'' [[ ''fade-out-length'' ] ]__
76 filter__ [[ ''low'' ]-[[ ''high'' ] [[
77 ''window-len'' [[ ''beta'' ]]__
78 flanger__ ''gain-in gain-out delay decay speed''
79 ''
80 highp__ ''frequency''__
81 highpass__ ''frequency''__
82 lowp__ ''frequency''__
83 lowpass__ ''frequency''__
84 map
85 mask
86 pan__ ''direction''__
87 phaser__ ''gain-in gain-out delay decay speed''
88 ''
89 pick__ [[ ''-1'' | ''-2'' | ''-3'' | ''-4'' |
90 ''-l'' | ''-r'' ]__
91 pitch__ ''shift'' [[ ''width interpole fade''
92 ]__
93 polyphase__ [[ -w __nut'' / ''ham''
94 ''-width'' ''long'' / ''short'' / #
95 ''-cutoff #'' ]__
96 rate
97 resample__ [[ -qs | -q | -ql ] [[ ''rolloff'' [[
98 ''beta'' ] ]__
99 reverb__ ''gain-out reverb-time delay'' [[ ''delay''
100 ... ]__
101 reverse
102 silence__ ''above_periods'' [[ ''duration
103 threshold''[[ ''d'' | ''%'' ] [[ ''below_periods
104 duration threshold''[[ ''d'' | ''%'' ]]__
105 speed__ [[ -c ] ''factor''__
106 split
107 stat__ [[ -s ''n'' ] [[ -rms ] [[ -v ] [[ -d ]__
108 stretch__ [[ ''factor'' [[ ''window fade shift
109 fading'' ]__
110 swap__ [[ ''1 2'' | ''1 2 3 4'' ]__
111 synth__ [[ ''length'' ] ''type mix'' [[ ''freq'' [[
112 ''-freq2'' ] [[ ''off'' ] [[ ''ph'' ] [[ ''p1'' ] [[
113 ''p2'' ] [[ ''p3'' ]__
114 trim__ ''start'' [[ ''length'' ]__
115 vibro__ ''speed'' [[ ''depth'' ]__
116 vol__ ''gain'' [[ ''type'' [[ ''limitergain'' ]
117 ]
118 !!DESCRIPTION
119
120
121 ''SoX'' is a command line program that can convert most
122 popular audio files to most other popular audio file
123 formats. It can optionally change the audio sample data type
124 and apply one or more sound effects to the file during this
125 translation.
126
127
128 ''soxmix'' is functionally the same as the command line
129 program ''sox'' expect that it takes two files as input
130 and mixes the audio together to produce a single file as
131 output. It has a restriction that both input files must be
132 of the same data type and sample rates.
133
134
135 There are two types of audio files formats that ''SoX''
136 can work with. The first are self-describing file formats.
137 These contain a header that completely describe the
138 characteristics of the audio data that follows.
139
140
141 The second type are header-less data, or sometimes called
142 raw data. A user must pass enough information to ''SoX''
143 on the command line so that it knows what type of data it
144 contains.
145
146
147 Audio data can usually be totally described by four
148 characteristics:
149
150
151 rate
152
153
154 The sample rate is in samples per second. For example, CD
155 sample rates are at 44100.
156
157
158 data size
159
160
161 The precision the data is stored in. Most popular are 8-bit
162 bytes or 16-bit words.
163
164
165 data encoding
166
167
168 What encoding the data type uses. Examples are u-law, ADPCM,
169 or signed linear data.
170
171
172 channels
173
174
175 How many channels are contained in the audio data. Mono and
176 Stereo are the two most common.
177
178
179 Please refer to the __soxexam(1)__ manual page for a long
180 description with examples on how to use SoX with various
181 types of file formats.
182 !!OPTIONS
183
184
185 The option syntax is a little grotty, but in
186 essence:
187
188
189 sox File.au file.wav
190
191
192 translates a sound file in SUN Sparc .AU format into a
193 Microsoft .WAV file, while
194
195
196 sox -v 0.5 file.au -r 12000 file.wav mask
197
198
199 does the same format translation but also lowers the
200 amplitude by 1/2, changes the sampling rate to 12000 hertz,
201 and applies the __mask__ sound effect to the audio
202 data.
203
204
205 The following will mix two sound files together to to
206 produce a single sound file.
207
208
209 soxmix music.wav voice.wav mixed.wav
210
211
212 __Format options:__
213
214
215 Format options effect the audio samples that they
216 immediately precede. If they are placed before the input
217 file name then they effect the input data. If they are
218 placed before the output file name then they will effect the
219 output data. By taking advantage of this, you can override a
220 input file's corrupted header or produce an output file that
221 is totally different style then the input file. It is also
222 how SoX is informed about the format of raw input
223 data.
224
225
226 __-t__ ''filetype''
227
228
229 gives the type of the sound sample file. Useful when file
230 extension is not standard or for specifying the .auto file
231 type.
232
233
234 __-r__ ''rate''
235
236
237 Gives the sample rate in Hertz of the file. To cause the
238 output file to have a different sample rate than the input
239 file, include this option as a part of the output
240 options.
241 If the input and output files have different rates then a
242 sample rate change effect must be ran. If a sample rate
243 changing effect is not specified then a default one will
244 internally be ran by SoX using its default
245 parameters.
246
247
248 __-s/-u/-U/-A/-a/-i/-g/-f__
249
250
251 The sample data encoding is signed linear (2's complement),
252 unsigned linear, u-law (logarithmic), A-law (logarithmic),
253 ADPCM, IMA_ADPCM, GSM, or Floating-point.
254 U-law (actually shorthand for mu-law) and A-law are the U.S.
255 and international standards for logarithmic telephone sound
256 compression. When uncompressed u-law has roughly the
257 precision of 14-byte PCM audio and A-law has roughly the
258 precision of 13-bit PCM audio.
259 A-law and u-law data is sometimes encoded using a reversed
260 bit-ordering (ie. MSB becomes LSB). Internally, SoX
261 understands how to work with this encoding but there is
262 currently no command line option to specify it. If you need
263 this support then you can use the psuedo file types of
264 ADPCM is a form of sound compression that has a good
265 compromise between good sound quality and fast
266 encoding/decoding time. It is used for telephone sound
267 compression and places were full fidelity is not as
268 important. When uncompressed it has roughly the precision of
269 16-bit PCM audio. Popular version of ADPCM include G.726, MS
270 ADPCM, and IMA ADPCM. The __-a__ flag has different
271 meanings in different file handlers. In __.wav__ files it
272 represents MS ADPCM files, in all others it means G.726
273 ADPCM. IMA ADPCM is a specific form of ADPCM compression,
274 slightly simpler and slightly lower fidelity than
275 Microsoft's flavor of ADPCM. IMA ADPCM is also called DVI
276 ADPCM.
277 GSM is a standard used for telephone sound compression in
278 European countries and its gaining popularity because of its
279 quality. It usually is CPU intensive to work with GSM audio
280 data.
281
282
283 __-b/-w/-l__
284
285
286 The sample data size is in bytes, 16-bit words, or 32-bit
287 long words.
288
289
290 __-x__ The sample data is in XINU format; that is, it
291 comes from a machine with the opposite word order than yours
292 and must be swapped according to the word-size given above.
293 Only 16-bit and 32-bit integer data may be swapped.
294 Machine-format floating-point data is not
295 portable.
296
297
298 __-c__ ''channels''
299
300
301 The number of sound channels in the data file. This may be
302 1, 2, or 4; for mono, stereo, or quad sound data. To cause
303 the output file to have a different number of channels than
304 the input file, include this option with the output file
305 options. If the input and output file have a different
306 number of channels then the avg effect must be used. If the
307 avg effect is not specified on the command line it will be
308 invoked internally with default parameters.
309
310
311 __-e__ When used after the input filename (so that it
312 applies to the output file) it allows you to avoid giving an
313 output filename and will not produce an output file. It will
314 apply any specified effects to the input file. This is
315 mainly useful with the __stat__ effect but can be used
316 with others.
317
318
319 __General options:__
320
321
322 __-h__ Print version number and usage
323 information.
324
325
326 __-p__ Run in preview mode and run fast. This will
327 somewhat speed up SoX when the output format has a different
328 number of channels and a different rate than the input file.
329 Currently, this defaults to using the __rate__ effect
330 instead of the __resample__ effect for sample rate
331 changes.
332
333
334 __-v__ ''volume''
335
336
337 Change amplitude (floating point); less than 1.0 decreases,
338 greater than 1.0 increases. May use a negative number to
339 invert the phase of the audio data. It is interesting to
340 note that we perceive volume logarithmically but this
341 adjusts the amplitude linearly.
342 Note: see the __stat__ effect for information on finding
343 the maximum value that can be used with this option without
344 causing audio data be be clipped.
345
346
347 __-V__ Print a description of processing phases. Useful
348 for figuring out exactly how ''SoX'' is mangling your
349 sound samples.
350 !!FILE TYPES
351
352
353 ''SoX'' attempts to determine the file type of input
354 files automatically by looking at the header of the audio
355 file. When it is unable to detect the file type or if its an
356 output file then it uses the file extension of the file to
357 determine what type of file format handler to use. This can
358 be overridden by specifying the
359 ''
360
361
362 The input and output files may be read from standard in and
363 out. This is done by specifying '-' as the
364 filename.
365
366
367 File formats which have headers are checked, if that header
368 doesn't seem right, the program exits with an appropriate
369 message.
370
371
372 The following file formats are supported:
373
374
375 __.8svx__
376
377
378 Amiga 8SVX musical instrument description
379 format.
380
381
382 __.aiff__
383
384
385 AIFF files used on Apple IIc/IIgs and SGI. Note: the AIFF
386 format supports only one SSND chunk. It does not support
387 multiple sound chunks, or the 8SVX musical instrument
388 description format. AIFF files are multimedia archives and
389 can have multiple audio and picture chunks. You may need a
390 separate archiver to work with them.
391
392
393 __.au__
394
395
396 SUN Microsystems AU files. There are apparently many types
397 of .au files; DEC has invented its own with a different
398 magic number and word order. The .au handler can read these
399 files but will not write them. Some .au files have valid AU
400 headers and some do not. The latter are probably original
401 SUN u-law 8000 hz samples. These can be dealt with using the
402 __.ul__ format (see below).
403
404
405 __.avr__
406
407
408 Audio Visual Research
409 The AVR format is produced by a number of commercial
410 packages on the Mac.
411
412
413 __.cdr__
414
415
416 CD-R
417 CD-R files are used in mastering music on Compact Disks. The
418 audio data on a CD-R disk is a raw audio file with a format
419 of stereo 16-bit signed samples at a 44khz sample rate.
420 There is a special blocking/padding oddity at the end of the
421 audio file and is why it needs its own handler.
422
423
424 __.cvs__
425
426
427 Continuously Variable Slope Delta modulation
428 Used to compress speech audio for applications such as voice
429 mail.
430
431
432 __.dat__
433
434
435 Text Data files
436 These files contain a textual representation of the sample
437 data. There is one line at the beginning that contains the
438 sample rate. Subsequent lines contain two numeric data
439 items: the time since the beginning of the first sample and
440 the sample value. Values are normalized so that the maximum
441 and minimum are 1.00 and -1.00. This file format can be used
442 to create data files for external programs such as FFT
443 analyzers or graph routines. SoX can also convert a file in
444 this format back into one of the other file
445 formats.
446
447
448 __.gsm__
449
450
451 GSM 06.10 Lossy Speech Compression
452 A standard for compressing speech which is used in the
453 Global Standard for Mobil telecommunications (GSM). Its good
454 for its purpose, shrinking audio data size, but it will
455 introduce lots of noise when a given sound sample is encoded
456 and decoded multiple times. This format is used by some
457 voice mail applications. It is rather CPU intensive.
458 GSM in __SoX__ is optional and requires access to an
459 external GSM library. To see if there is support for gsm run
460 __sox -h__ and look for it under the list of supported
461 file formats.
462
463
464 __.hcom__
465
466
467 Macintosh HCOM files. These are (apparently) Mac FSSD files
468 with some variant of Huffman compression. The Macintosh has
469 wacky file formats and this format handler apparently
470 doesn't handle all the ones it should. Mac users will need
471 your usual arsenal of file converters to deal with an HCOM
472 file under Unix or DOS.
473
474
475 __.maud__
476
477
478 An Amiga format
2 perry 479 An IFF-conform sound file type, registered by MS !MacroSystem
1 perry 480 Computer GmbH, published along with the
481
482
483 __.nul__
484
485
486 Null file handler. This is a fake file hander that act as if
487 its reading a stream of 0's from a while or fake writing
488 output to a file. This is not a very useful file handler in
489 most cases. It might be useful in some scripts were you do
490 not want to read or write from a real file but would like to
491 specify a filename for consistency.
492
493
494 __.ogg__
495
496
497 Ogg Vorbis Compressed Audio.
498 Ogg Vorbis is a open, patent-free CODEC designed for
499 compressing music and streaming audio. It is similar to MP3,
500 VQF, AAC, and other lossy formats. __SoX__ can decode all
501 types of Ogg Vorbis files, but can only encode at 128 kbps.
502 Decoding is somewhat CPU intensive and encoding is very CPU
503 intensive.
504 Ogg Vorbis in __SoX__ is optional and requires access to
505 external Ogg Vorbis libraries. To see if there is support
506 for Ogg Vorbis run __sox -h__ and look for it under the
507 list of supported file formats as
508 __
509
510
511 __ossdsp__
512
513
514 OSS /dev/dsp device driver
515 This is a pseudo-file type and can be optionally compiled
516 into SoX. Run __sox -h__ to see if you have support for
517 this file type. When this driver is used it allows you to
518 open up the OSS /dev/dsp file and configure it to use the
519 same data format as passed in to __SoX__. It works for
520 both playing and recording sound samples. When playing sound
521 files it attempts to set up the OSS driver to use the same
522 format as the input file. It is suggested to always override
523 the output values to use the highest quality samples your
524 sound card can handle. Example: ''-t ossdsp -w -s
525 /dev/dsp''
526
527
528 __.sf__
529
530
531 IRCAM Sound Files.
532 Sound Files are used by academic music software such as the
2 perry 533 CSound package, and the !MixView sound sample
1 perry 534 editor.
535
536
537 __.sph__
538
539
540 SPHERE (SPeech HEader Resources) is a file format defined by
541 NIST (National Institute of Standards and Technology) and is
542 used with speech audio. SoX can read these files when they
543 contain u-law and PCM data. It will ignore any header
544 information that says the data is compressed using
545 ''shorten'' compression and will treat the data as either
546 u-law or PCM. This will allow SoX and the command line
547 ''shorten'' program to be ran together using pipes to
548 uncompress the data and then pass the result to SoX for
549 processing.
550
551
552 __.smp__
553
554
2 perry 555 Turtle Beach !SampleVision files.
556 SMP files are for use with the PC-DOS package !SampleVision
1 perry 557 by Turtle Beach Softworks. This package is for communication
558 to several MIDI samplers. All sample rates are supported by
559 the package, although not all are supported by the samplers
560 themselves. Currently loop points are ignored.
561
562
563 __.snd__
564
565
566 Under DOS this file format is the same as the __.sndt__
567 format. Under all other platforms it is the same as the
568 __.au__ format.
569
570
571 __.sndt__
572
573
2 perry 574 !SoundTool files.
1 perry 575 This is an older DOS file format.
576
577
578 __sunau__
579
580
581 Sun /dev/audio device driver
582 This is a pseudo-file type and can be optionally compiled
583 into SoX. Run __sox -h__ to see if you have support for
584 this file type. When this driver is used it allows you to
585 open up a Sun /dev/audio file and configure it to use the
586 same data type as passed in to __SoX.__ It works for both
587 playing and recording sound samples. When playing sound
588 files it attempts to set up the audio driver to use the same
589 format as the input file. It is suggested to always override
590 the output values to use the highest quality samples your
591 hardware can handle. Example: ''-t sunau -w -s
592 /dev/audio'' or ''-t sunau -U -c 1 /dev/audio'' for
593 older sun equipment.
594
595
596 __.txw__
597
598
599 Yamaha TX-16W sampler.
600 A file format from a Yamaha sampling keyboard which wrote
601 IBM-PC format 3.5
602
603
604 __.vms__
605
606
607 More info to come.
608 Used to compress speech audio for applications such as voice
609 mail.
610
611
612 __.voc__
613
614
615 Sound Blaster VOC files.
616 VOC files are multi-part and contain silence parts, looping,
617 and different sample rates for different chunks. On input,
618 the silence parts are filled out, loops are rejected, and
619 sample data with a new sample rate is rejected. Silence with
620 a different sample rate is generated appropriately. On
621 output, silence is not detected, nor are impossible sample
622 rates. Note, this version now supports playing VOC files
623 with multiple blocks and supports playing files containing
624 u-law and A-law samples.
625
626
627 __vorbis__
628
629
630 See __.ogg__ format.
631
632
633 __.wav__
634
635
636 Microsoft .WAV RIFF files.
637 These appear to be very similar to IFF files, but not the
638 same. They are the native sound file format of Windows.
639 (Obviously, Windows was of such incredible importance to the
640 computer industry that it just had to have its own sound
641 file format.) Normally __.wav__ files have all formatting
642 information in their headers, and so do not need any format
643 options specified for an input file. If any are, they will
644 override the file header, and you will be warned to this
645 effect. You had better know what you are doing! Output
646 format options will cause a format conversion, and the
647 __.wav__ will written appropriately. SoX currently can
648 read PCM, ULAW, ALAW, MS ADPCM, and IMA (or DVI) ADPCM. It
649 can write all of these formats including __(NEW!)__ the
650 ADPCM encoding.
651
652
653 __.wve__
654
655
656 Psion 8-bit A-law
657 These are 8-bit A-law 8khz sound files used on the Psion
658 palmtop portable computer.
659
660
661 __.raw__
662
663
664 Raw files (no header).
665 The sample rate, size (byte, word, etc), and encoding
666 (signed, unsigned, etc.) of the sample file must be given.
667 The number of channels defaults to 1.
668
669
670 __.ub, .sb, .uw, .sw, .ul, .al, .lu, .la,
671 .sl__
672
673
674 These are several suffices which serve as a shorthand for
675 raw files with a given size and encoding. Thus, __ub, sb,
676 uw, sw, ul, al, lu, la__ and __sl__ correspond to
677 __
678
679
680 __.auto__
681
682
683 This is a ``meta-type'': specifying this type for an input
684 file triggers some code that tries to guess the real type by
685 looking for magic words in the header. If the type can't be
686 guessed, the program exits with an error message. The input
687 must be a plain file, not a pipe. This type can't be used
688 for output files.
689 !!EFFECTS
690
691
692 Multiple effects may be applied to the audio data by
693 specifying them one after another at the end of the command
694 line.
695
696
697 avg [[ ''-l'' | ''-r'' | ''-f'' | ''-b'' |
698 ''n,n,...,n'' ]
699
700
701 Reduce the number of channels by averaging the samples, or
702 duplicate channels to increase the number of channels. This
703 effect is automatically used when the number of input
704 channels differ from the number of output channels. When
705 reducing the number of channels it is possible to manually
706 specify the avg effect and use the ''-l'', ''-r'',
707 ''-f'', or ''-b'' options to select only the left,
708 right, front, or back channel(s) for the output instead of
709 averaging the channels. The ''-f'' and ''-b'' options
710 maintain left/right stereo separation; use the avg effect
711 twice to select a single channel.
712
713
714 The avg effect can also be invoked with up to 16
715 double-precision numbers, which specify the proportion of
716 each input channel that is to be mixed into each output
717 channel. In two-channel mode, 4 numbers are given: l-
718
719
720 It is also possible to use the 16 numbers to expand or
721 reduce the channel count; just specify 0 for unused
722 channels. Finally, if fewer than 4 numbers are given,
723 certain special abbreviations may be invoked; see the source
724 code for details.
725
726
727 band __[[__ ''-n'' __]__ ''center'' __[[__
728 ''width'' __]__
729
730
731 Apply a band-pass filter. The frequency response drops
732 logarithmically around the ''center'' frequency. The
733 ''width'' gives the slope of the drop. The frequencies at
734 ''center + width'' and ''center - width'' will be half
735 of their original amplitudes. __Band__ defaults to a mode
736 oriented to pitched signals, i.e. voice, singing, or
737 instrumental music. The ''-n'' (for noise) option uses
738 the alternate mode for un-pitched signals. __Warning:__
739 ''-n'' introduces a power-gain of about 11dB in the
740 filter, so beware of output clipping. __Band__ introduces
741 noise in the shape of the filter, i.e. peaking at the
742 ''center'' frequency and settling around it. See
743 __filter__ for a bandpass effect with steeper
744 shoulders.
745
746
747 bandpass ''frequency bandwidth''
748
749
750 Butterworth bandpass filter. Description coming
751 soon!
752
753
754 bandreject ''frequency bandwidth''
755
756
757 Butterworth bandreject filter. Description coming
758 soon!
759
760
761 chorus ''gain-in gain-out delay decay speed
762 depth''
763
764
765 -s | ''-t [[ delay decay speed depth -s'' | ''-t ...''
766 ]
767
768
769 Add a chorus to a sound sample. Each quadtuple
770 delay/decay/speed/depth gives the delay in milliseconds and
771 the decay (relative to gain-in) with a modulation speed in
772 Hz using depth in milliseconds. The modulation is either
773 sinusoidal (-s) or triangular (-t). Gain-out is the volume
774 of the output.
775
776
777 compand
778 ''attack1,decay1''[[,''attack2,decay2''...]
779
780
781 ''in-dB1,out-dB1''[[,''in-dB2,out-dB2''...]
782
783
784 [[''gain'' [[''initial-volume'' [[''delay'' ] ]
785 ]
786
787
788 Compand (compress or expand) the dynamic range of a sample.
789 The attack and decay time specify the integration time over
790 which the absolute value of the input signal is integrated
791 to determine its volume; attacks refer to increases in
792 volume and decays refer to decreases. Where more than one
793 pair of attack/decay parameters are specified, each channel
794 is treated separately and the number of pairs must agree
795 with the number of input channels. The second parameter is a
796 list of points on the compander's transfer function
797 specified in dB relative to the maximum possible signal
798 amplitude. The input values must be in a strictly increasing
799 order but the transfer function does not have to be
800 monotonically rising. The special value ''-inf'' may be
801 used to indicate that the input volume should be associated
802 output volume. The points ''-inf,-inf'' and ''0,0''
803 are assumed; the latter may be overridden, but the former
804 may not.
805
806
807 The third (optional) parameter is a post-processing gain in
808 dB which is applied after the compression has taken place;
809 the fourth (optional) parameter is an initial volume to be
810 assumed for each channel when the effect starts. This
811 permits the user to supply a nominal level initially, so
812 that, for example, a very large gain is not applied to
813 initial signal levels before the companding action has begun
814 to operate: it is quite probable that in such an event, the
815 output would be severely clipped while the compander gain
816 properly adjusts itself.
817
818
819 The fifth (optional) parameter is a delay in seconds. The
820 input signal is analyzed immediately to control the
821 compander, but it is delayed before being fed to the volume
822 adjuster. Specifying a delay approximately equal to the
823 attack/decay times allows the compander to effectively
824 operate in a
825
826
827 copy
828
829
830 Copy the input file to the output file. This is the default
831 effect if both files have the same sampling
832 rate.
833
834
835 dcshift ''shift'' [[ ''limitergain'' ]
836
837
838 DC Shift the audio data, with basic linear amplitude
839 formula. This is most useful if your audio data tends to not
840 be centered around a value of 0. Shifting it back will allow
841 you to get the most volume adjustments without clipping
842 audio data.
843 The first option is the ''dcshift'' value. It is a
844 floating point number that indicates the amount to
845 shift.
846 An option limtergain value can be specified as well. It
847 should have a value much less then 1.0 and is used only on
848 peaks to prevent clipping.
849
850
851 deemph
852
853
854 Apply a treble attenuation shelving filter to samples in
855 audio cd format. The frequency response of pre-emphasized
856 recordings is rectified. The filtering is defined in the
857 standard document ISO 908.
858
859
860 earwax
861
862
863 Makes sound easier to listen to on headphones. Adds
864 audio-cues to samples in audio cd format so that when
865 listened to on headphones the stereo image is moved from
866 inside your head (standard for headphones) to outside and in
867 front of the listener (standard for speakers). See
868 www.geocities.com/beinges for a full
869 explanation.
870
871
872 echo ''gain-in gain-out delay decay'' [[ ''delay decay
873 ...'' ]
874
875
876 Add echoing to a sound sample. Each delay/decay part gives
877 the delay in milliseconds and the decay (relative to
878 gain-in) of that echo. Gain-out is the volume of the
879 output.
880
881
882 echos ''gain-in gain-out delay decay'' [[ ''delay decay
883 ...'' ]
884
885
886 Add a sequence of echos to a sound sample. Each delay/decay
887 part gives the delay in milliseconds and the decay (relative
888 to gain-in) of that echo. Gain-out is the volume of the
889 output.
890
891
892 fade [[ ''type'' ] ''fade-in-length''
893
894
895 [[ ''stop-time'' [[ ''fade-out-length'' ] ]
896
897
898 Add a fade effect to the beginning, end, or both of the
899 audio data.
900
901
902 For fade-ins, this starts from the first sample and ramps
903 the volume of the audio from 0 to full volume over
904 ''fade-in-length'' seconds. Specify 0 seconds if no
905 fade-in is wanted.
906
907
908 For fade-outs, the audio data will be truncated at the
909 stop-time and the volume will be ramped from full volume
910 down to 0 starting at ''fade-out-length'' seconds before
911 the ''stop-time''. No fade-out is performed if these
912 options are not specified.
913 All times can be specified in either periods of time or
914 sample counts. To specify time periods use the format
915 hh:mm:ss.frac format. To specify using sample counts,
916 specify the number of samples and append the letter 's' to
917 the sample count (for example 8000s).
918 An optional ''type'' can be specified to change the type
919 of envelope. Choices are q for quarter of a sinewave, h for
920 half a sinewave, t for linear slope, l for logarithmic, and
921 p for inverted parabola. The default is a linear
922 slope.
923
924
925 filter [[ ''low'' ]-[[ ''high'' ] [[ ''window-len'' [[
926 ''beta'' ] ]
927
928
929 Apply a Sinc-windowed lowpass, highpass, or bandpass filter
930 of given window length to the signal. ''low'' refers to
931 the frequency of the lower 6dB corner of the filter.
932 ''high'' refers to the frequency of the upper 6dB corner
933 of the filter.
934
935
936 A lowpass filter is obtained by leaving ''low''
937 unspecified, or 0. A highpass filter is obtained by leaving
938 ''high'' unspecified, or 0, or greater than or equal to
939 the Nyquist frequency.
940
941
942 The ''window-len'', if unspecified, defaults to 128.
943 Longer windows give a sharper cutoff, smaller windows a more
944 gradual cutoff.
945
946
947 The ''beta'', if unspecified, defaults to 16. This
948 selects a Kaiser window. You can select a Nuttall window by
949 specifying anything
950 ''resample__ effect.
951
952
953 flanger ''gain-in gain-out delay decay speed''
954 ''
955
956
957 Add a flanger to a sound sample. Each triple
958 delay/decay/speed gives the delay in milliseconds and the
959 decay (relative to gain-in) with a modulation speed in Hz.
960 The modulation is either sinodial (-s) or triangular (-t).
961 Gain-out is the volume of the output.
962
963
964 highp ''frequency''
965
966
967 Apply a single pole recursive high-pass filter. The
968 frequency response drops logarithmically with I frequency in
969 the middle of the drop. The slope of the filter is quite
970 gentle. See __filter__ for a highpass effect with sharper
971 cutoff.
972
973
974 highpass ''frequency''
975
976
977 Butterworth highpass filter. Description coming
978 soon!
979
980
981 lowp ''frequency''
982
983
984 Apply a single pole recursive low-pass filter. The frequency
985 response drops logarithmically with ''frequency'' in the
986 middle of the drop. The slope of the filter is quite gentle.
987 See __filter__ for a lowpass effect with sharper
988 cutoff.
989
990
991 lowpass ''frequency''
992
993
994 Butterworth lowpass filter. Description coming
995 soon!
996
997
998 map
999
1000
1001 Display a list of loops in a sample, and miscellaneous loop
1002 info.
1003
1004
1005 mask
1006
1007
1008 Add
1009
1010
1011 pan ''direction''
1012
1013
1014 Pan the sound of an audio file from one channel to another.
1015 This is done by changing the volume of the input channels so
1016 that it fades out on one channel and fades-in on another. If
1017 the number of input channels is different then the number of
1018 output channels then this effect tries to intelligently
1019 handle this. For instance, if the input contains 1 channel
1020 and the output contains 2 channels, then it will create the
1021 missing channel itself. The ''direction'' is a value from
1022 -1.0 to 1.0. -1.0 represents far left and 1.0 represents far
1023 right. Numbers in between will start the pan effect without
1024 totally muting the opposite channel.
1025
1026
1027 phaser ''gain-in gain-out delay decay speed''
1028 ''
1029
1030
1031 Add a phaser to a sound sample. Each triple
1032 delay/decay/speed gives the delay in milliseconds and the
1033 decay (relative to gain-in) with a modulation speed in Hz.
1034 The modulation is either sinodial (-s) or triangular (-t).
1035 The decay should be less than 0.5 to avoid feedback.
1036 Gain-out is the volume of the output.
1037
1038
1039 pick [[ ''-1'' | ''-2'' | ''-3'' | ''-4'' |
1040 ''-l'' | ''-r'' ]
1041
1042
1043 Select the left or right channel of a stereo sample, or one
1044 of four channels in a quadraphonic sample. The ''-l'' and
1045 ''-r'' options represent either the left or right
1046 channel. It is required that you use the __-c 1__ command
1047 line option in order to force the output file to contain
1048 only 1 channel.
1049
1050
1051 pitch ''shift [[ width interpole fade ]''
1052
1053
1054 Change the pitch of file without affecting its duration by
1055 cross-fading shifted samples. ''shift'' is given in
1056 cents. Use a positive value to shift to treble, negative
1057 value to shift to bass. Default shift is 0. ''width'' of
1058 window is in ms. Default width is 20ms. Try 30ms to lower
1059 pitch, and 10ms to raise pitch. ''interpole'' option, can
1060 be
1061 ''fade'' option, can be
1062 ''
1063
1064
1065 polyphase [[ ''-w'' ''nut'' / ''ham''
1066 ''
1067
1068
1069 [[ ''-width'' ''long'' / ''short'' / ''#''
1070 ''
1071
1072
1073 [[ ''-cutoff #'' ]
1074
1075
1076 Translate input sampling rate to output sampling rate via
1077 polyphase interpolation, a DSP algorithm. This method is
1078 slow and uses lots of RAM, but gives much better results
1079 than __rate.__
1080
1081
1082 -w
1083 nut.''
1084
1085
1086 -width long / short / # : specify the (approximate) width of
1087 the filter. ''long'' is 1024 samples; ''short'' is 128
1088 samples. Alternatively, an exact number can be used. Default
1089 is ''long.'' The ''short'' option is __not__
1090 recommended, as it produces poor quality
1091 results.
1092
1093
1094 -cutoff # : specify the filter cutoff frequency in terms of
1095 fraction of frequency bandwidth, also know as the Nyquist
1096 frequency. Please see the ''resample'' effect for further
1097 information on Nyquist frequency. If upsampling, then this
1098 is the fraction of the original signal that should go
1099 through. If downsampling, this is the fraction of the signal
1100 left after downsampling. Default is 0.95. Remember that this
1101 is a float.
1102
1103
1104 rate
1105
1106
1107 Translate input sampling rate to output sampling rate via
1108 linear interpolation to the Least Common Multiple of the two
1109 sampling rates. This is the default effect if the two files
1110 have different sampling rates and the preview options was
1111 specified. This is fast but noisy: the spectrum of the
1112 original sound will be shifted upwards and duplicated
1113 faintly when up-translating by a multiple.
1114
1115
1116 Lerp-ing is acceptable for cheap 8-bit sound hardware, but
1117 for CD-quality sound you should instead use either
1118 __resample__ or __polyphase.__ If you are wondering
1119 which rate changing effects to use, you will want to read a
1120 detailed analysis of all of them at
1121 http://eakaw2.et.tu-dresden.de/~wilde/resample/resample.html
1122
1123
1124 resample [[ ''-qs'' __|__ ''-q'' __|__ ''-ql''
1125 __] [[__ ''rolloff'' __[[__ ''beta'' __]
1126 ]__
1127
1128
1129 Translate input sampling rate to output sampling rate via
1130 simulated analog filtration. This method is slower than
1131 __rate,__ but gives much better results.
1132
1133
1134 By default, linear interpolation is used, with a window
1135 width about 45 samples at the lower of the two rate. This
1136 gives an accuracy of about 16 bits, but insufficient
1137 stopband rejection in the case that you want to have rolloff
1138 greater than about 0.80 of the Nyquist
1139 frequency.
1140
1141
1142 The ''-q*'' options will change the default values for
1143 rolloff and beta as well as use quadratic interpolation of
1144 filter coefficients, resulting in about 24 bits precision.
1145 The ''-qs'', ''-q'', or ''-ql'' options specify
1146 increased accuracy at the cost of lower execution speed. It
1147 is optional to specify rolloff and beta parameters when
1148 using the ''-q*'' options.
1149
1150
1151 Following is a table of the reasonable defaults which are
1152 built-in to SoX:
1153
1154
1155 __Option Window rolloff beta interpolation
1156 ------ ------ ------- ---- -------------__
1157 (none) 45 0.80 16 linear
1158 -qs 45 0.80 16 quadratic
1159 -q 75 0.875 16 quadratic
1160 -ql 149 0.94 16 quadratic__
1161 ------ ------ ------- ---- -------------__
1162
1163
1164 ''-qs'', ''-q'', or ''-ql'' use window lengths of
1165 45, 75, or 149 samples, respectively, at the lower
1166 sample-rate of the two files. This means progressively
1167 sharper stop-band rejection, at proportionally slower
1168 execution times.
1169
1170
1171 ''rolloff'' refers to the cut-off frequency of the low
1172 pass filter and is given in terms of the Nyquist frequency
1173 for the lower sample rate. rolloff therefore should be
1174 something between 0.0 and 1.0, in practice 0.8-0.95. The
1175 defaults are indicated above.
1176
1177
1178 The ''Nyquist frequency'' is equal to (sample rate / 2).
1179 Logically, this is because the A/D converter needs at least
1180 2 samples to detect 1 cycle at the Nyquist frequency.
1181 Frequencies higher then the Nyquist will actually appear as
1182 lower frequencies to the A/D converter and is called
1183 aliasing. Normally, A/D converts run the signal through a
1184 highpass filter first to avoid these problems.
1185
1186
1187 Similar problems will happen in software when reducing the
1188 sample rate of an audio file (frequencies above the new
1189 Nyquist frequency can be aliased to lower frequencies).
1190 Therefore, a good resample effect will remove all frequency
1191 information above the new Nyquist frequency.
1192
1193
1194 The ''rolloff'' refers to how close to the Nyquist
1195 frequency this cutoff is, with closer being better. When
1196 increasing the sample rate of an audio file you would not
1197 expect to have any frequencies exist that are past the
1198 original Nyquist frequency. Because of resampling
1199 properties, it is common to have alaising data created that
1200 is above the old Nyquist frequency. In that case the
1201 ''rolloff'' refers to how close to the original Nyquist
1202 frequency to use a highpass filter to remove this false
1203 data, with closer also being better.
1204
1205
1206 The ''beta'' parameter determines the type of filter
1207 window used. Any value greater than 2.0 is the beta for a
1208 Kaiser window. Beta
1209 ''
1210
1211
1212 In the case of Kaiser window (beta
1213
1214
1215 This is the default effect if the two files have different
1216 sampling rates. Default parameters are, as indicated above,
1217 Kaiser window of length 45, rolloff 0.80, beta 16, linear
1218 interpolation.
1219
1220
1221 __NOTE:__ ''-qs'' is only slightly slower, but more
1222 accurate for 16-bit or higher precision.
1223
1224
1225 __NOTE:__ In many cases of up-sampling, no interpolation
1226 is needed, as exact filter coefficients can be computed in a
1227 reasonable amount of space. To be precise, this is done
1228 when
1229
1230
1231 input_rate
1232 output_rate/gcd(input_rate,output_rate)
1233
1234
1235 reverb ''gain-out delay'' [[ ''delay ...''
1236 ]
1237
1238
1239 Add reverberation to a sound sample. Each delay is given in
1240 milliseconds and its feedback is depending on the
1241 reverb-time in milliseconds. Each delay should be in the
1242 range of half to quarter of reverb-time to get a realistic
1243 reverberation. Gain-out is the volume of the
1244 output.
1245
1246
1247 reverse
1248
1249
1250 Reverse the sound sample completely. Included for finding
1251 Satanic subliminals.
1252
1253
1254 __silence__ ''above_periods'' [[ ''duration
1255 threshold''[[ ''d'' | ''%'' ]
1256
1257
1258 [[ ''below_periods duration''
1259
1260
1261 threshold[[ ''d'' | ''%'' ]]
1262
1263
1264 Removes silence from the beginning or end of a sound file.
1265 Silence is anything below a specified threshold.
1266 When trimming silence from the beginning of a sound file,
1267 you specify a duration of audio that is above a given
1268 silence threshold before audio data is processed. You can
1269 also specify the count of periods of none silence you want
1270 to detect before processing audio data. Specify a period of
1271 0 if you do not want to trim data from the front of the
1272 sound file.
1273 When optionally trimming silence form the end of a sound
1274 file, you specify the duration of audio that must be below a
1275 given threshold before stopping to process audio data. A
1276 count of periods that occur below the threshold may also be
1277 specified. If this options are not specified then data is
1278 not trimmed from the end of the audio file.
1279 Duration counts may be in the format of time, hh:mm:ss.frac,
1280 or in the exact count of samples.
1281 Threshold may be suffixed with d, or % to indicated the
1282 value is in decibels or a percentage of max value of the
1283 sample value. A value of '0%' will look for total
1284 silence.
1285
1286
1287 speed [[ -c ] ''factor''
1288
1289
1290 Speed up or down the sound, as a magnetic tape with a speed
1291 control. It affects both pitch and time. A factor of 1.0
1292 means no change, and is the default. 2.0 doubles speed, thus
1293 time length is cut by a half and pitch is one octave higher.
1294 0.5 halves speed thus time length doubles and pitch is one
1295 octave lower. If the optional -c parameter is used then the
1296 factor is specified in
1297
1298
1299 split
1300
1301
1302 Turn a mono sample into a stereo sample by copying the input
1303 channel to the left and right channels.
1304
1305
1306 stat [[ ''-s n'' __] [[__''-rms'' __] [[__
1307 ''-v'' __] [[__ ''-d'' __]__
1308
1309
1310 Do a statistical check on the input file, and print results
1311 on the standard error file. Audio data is passed unmodified
1312 from input to output file unless used along with the
1313 __-e__ option.
1314
1315
1316 The
1317 -v__ ''number'' which
1318 will make the sample as loud as possible without
1319 clipping.
1320
1321
1322 The option __-v__ will print out the
1323 __
1324
1325
1326 The __-s n__ option is used to scale the input data by a
1327 given factor. The default value of n is the max value of a
1328 signed long variable (0x7fffffff). Internal effects always
1329 work with signed long PCM data and so the value should
1330 relate to this fact.
1331
1332
1333 The __-rms__ option will convert all output average
1334 values to ''root mean square'' format.
1335
1336
1337 There is also an optional parameter __-d__ that will
1338 print out a hex dump of the sound file from the internal
1339 buffer that is in 32-bit signed PCM data. This is mainly
1340 only of use in tracking down endian problems that creep in
1341 to SoX on cross-platform versions.
1342
1343
1344 stretch ''factor [[window fade shift
1345 fading]''
1346
1347
1348 Time stretch file by a given factor. Change duration without
1349 affecting the pitch. ''factor'' of stretching:
1350 ''window'' size is in
1351 ms. Default is 20ms. The ''fade'' option, can be
1352 ''shift'' ratio, in [[0.0 1.0]. Default
1353 depends on stretch factor. 1.0 to shorten, 0.8 to lengthen.
1354 The ''fading'' ratio, in [[0.0 0.5]. The amount of a
1355 fade's default depends on factor and shift.
1356
1357
1358 swap [[ ''1 2'' __|__ ''1 2 3 4''
1359 __]__
1360
1361
1362 Swap channels in multi-channel sound files. Optionally, you
1363 may specify the channel order you would like the output in.
1364 This defaults to output channel 2 and then 1 for stereo and
1365 2, 1, 4, 3 for quad-channels. An interesting feature is that
1366 you may duplicate a given channel by overwriting another.
1367 This is done by repeating an output channel on the command
1368 line. For example, swap 2 2 will overwrite channel 1 with
1369 channel 2's data; creating a stereo file with both channels
1370 containing the same audio data.
1371
1372
1373 synth [[ ''length'' ] ''type mix'' [[ ''freq'' [[
1374 ''-freq2'' ]
1375
1376
1377 [[ ''off'' ] [[ ''ph'' ] [[ ''p1'' ] [[ ''p2'' ] [[
1378 ''p3'' ]
1379
1380
1381 The synth effect will generate various types of audio data.
1382 Although this effect is used to generate audio data, an
1383 input file must be specified. The length of the input audio
1384 file determines the length of the output audio file.
1385
1386
1387 trim ''start'' [[ ''length'' ]
1388
1389
1390 Trim can trim off unwanted audio data from the beginning and
1391 end of the audio file. Audio samples are not sent to the
1392 output stream until the ''start'' location is
1393 reached.
1394 The optional ''length'' parameter tells the number of
1395 samples to output after the ''start'' sample and is used
1396 to trim off the back side of the audio data. Using a value
1397 of 0 for the ''start'' parameter will allow trimming off
1398 the back side only.
1399 Both options can be specified using either an amount of time
1400 and an exact count of samples. The format for specifying
1401 lengths in time is hh:mm:ss.frac. A start value of 1:30.5
1402 will not start until 1 minute, thirty and 1/2 seconds into
1403 the audio data. The format for specifying sample counts is
1404 the number of samples with the letter 's' appended to it. A
1405 value of 8000s will wait until 8000 samples are read before
1406 starting to process audio data.
1407
1408
1409 vibro ''speed'' __[[__ ''depth''
1410 __]__
1411
1412
1413 Add the world-famous Fender Vibro-Champ sound effect to a
1414 sound sample by using a sine wave as the volume knob.
1415 __Speed__ gives the Hertz value of the wave. This must be
1416 under 30. __Depth__ gives the amount the volume is cut
1417 into by the sine wave, ranging 0.0 to 1.0 and defaulting to
1418 0.5.
1419
1420
1421 vol ''gain'' [[ ''type'' __[[__ ''limitergain'' ]
1422 ]
1423
1424
1425 The vol effect is much like the command line option -v. It
1426 allows you to adjust the volume of an input file and allows
1427 you to specify the adjustment in relation to amplitude,
1428 power, or dB. If ''type'' is not specified then it
1429 defaults to ''amplitude''.
1430 When type is ''amplitude'' then a linear change of the
1431 amplitude is performed based on the gain. Therefore, a value
1432 of 1.0 will keep the volume the same, 0.0 to
1433 ''
1434 When type is ''power'' then a value of 1.0 also means no
1435 change in volume.
1436 When type is ''dB'' the amplitude is changed
1437 logarithmically. 0.0 is constant while +6 doubles the
1438 amplitude.
1439 An optional ''limitergain'' value can be specified and
1440 should be a value much less then 1.0 (ie 0.05 or 0.02) and
1441 is used only on peaks to prevent clipping. Not specifying
1442 this parameter will cause no limiter to be used. In verbose
1443 mode, this effect will display the percentage of audio data
1444 that needed to be limited.
1445 !!BUGS
1446
1447
1448 The syntax is horrific. Thats the breaks when trying to
1449 handle all things from the command line.
1450
1451
1452 Please report any bugs found in this version of SoX to Chris
1453 Bagwell (cbagwell@sprynet.com)
1454 !!FILES
1455 !!SEE ALSO
1456
1457
1458 play(1), rec(1),
1459 __soxexam(1)__
1460 !!NOTICES
1461
1462
1463 The version of SoX that accompanies this manual page is
1464 support by Chris Bagwell (cbagwell@users.sourceforge.net).
1465 Please refer any questions regarding it to this address. You
1466 may obtain the latest version at the the web site
1467 http://sox.sourceforge.net/
1468 !!AUTHOR
1469
1470
1471 Chris Bagwell (cbagwell@users.sourceforge.net).
1472
1473
1474 Updates by Anonymous
1475 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.