Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
SoX(1)				Sound eXchange				SoX(1)

       SoX - Sound eXchange, the Swiss Army knife of audio manipulation

       sox [global-options] [format-options] infile1
	    [[format-options] infile2] ... [format-options] outfile
	    [effect [effect-options]] ...

       play [global-options] [format-options] infile1
	    [[format-options] infile2] ... [format-options]
	    [effect [effect-options]] ...

       rec [global-options] [format-options] outfile
	    [effect [effect-options]] ...

       SoX  reads  and	writes audio files in most popular formats and can op-
       tionally	apply effects to them. It can combine multiple input  sources,
       synthesise  audio, and, on many systems,	act as a general purpose audio
       player or a multi-track audio recorder. It also has limited ability  to
       split the input into multiple output files.

       All SoX functionality is	available using	just the sox command.  To sim-
       plify playing and recording audio, if SoX is invoked as play, the  out-
       put  file  is  automatically set	to be the default sound	device,	and if
       invoked as rec, the default sound device	is used	as  an	input  source.
       Additionally,  the  soxi(1)  command  provides a	convenient way to just
       query audio file	header information.

       The heart of SoX	is a library called libSoX.  Those interested  in  ex-
       tending	SoX  or	 using it in other programs should refer to the	libSoX
       manual page: libsox(3).

       SoX is a	command-line audio processing  tool,  particularly  suited  to
       making quick, simple edits and to batch processing.  If you need	an in-
       teractive, graphical audio editor, use audacity(1).

				 *	  *	   *

       The overall SoX processing chain	can be summarised as follows:

		    Input(s) ->	Combiner -> Effects -> Output(s)

       Note however, that on the SoX command line, the positions of  the  Out-
       put(s)  and the Effects are swapped w.r.t. the logical flow just	shown.
       Note also that whilst options pertaining	to  files  are	placed	before
       their  respective file name, the	opposite is true for effects.  To show
       how this	works in practice, here	is a selection of examples of how  SoX
       might be	used.  The simple
	  sox recital.wav
       translates  an  audio  file  in	Sun AU format to a Microsoft WAV file,
	  sox -b 16 recital.wav channels 1 rate 16k fade 3 norm
       performs	the same format	translation, but  also	applies	 four  effects
       (down-mix  to  one channel, sample rate change, fade-in,	nomalize), and
       stores the result at a bit-depth	of 16.
	  sox -r 16k -e	signed -b 8 -c 1 voice-memo.raw	voice-memo.wav
       converts	`raw' (a.k.a. `headerless') audio to  a	 self-describing  file
	  sox slow.aiff	fixed.aiff speed 1.027
       adjusts audio speed,
	  sox short.wav	long.wav longer.wav
       concatenates two	audio files, and
	  sox -m music.mp3 voice.wav mixed.flac
       mixes together two audio	files.
	  play "The Moonbeams/Greatest/*.ogg" bass +3
       plays  a	 collection of audio files whilst applying a bass boosting ef-
	  play -n -c1 synth sin	%-12 sin %-9 sin %-5 sin %-2 fade h 0.1	1 0.1
       plays a synthesised `A minor seventh' chord with	a pipe-organ sound,
	  rec -c 2 radio.aiff trim 0 30:00
       records half an hour of stereo audio, and
	  play -q take1.aiff & rec -M take1.aiff take1-dub.aiff
       (with POSIX shell and where supported by	hardware) records a new	 track
       in a multi-track	recording.  Finally,
	  rec -r 44100 -b 16 -e	signed-integer -p \
	    silence 1 0.50 0.1%	1 10:00	0.1% | \
	    sox	-p song.ogg silence 1 0.50 0.1%	1 2.0 0.1% : \
	    newfile : restart
       records a stream	of audio such as LP/cassette and splits	in to multiple
       audio files at points with 2 seconds of silence.	  Also,	 it  does  not
       start  recording	 until	it detects audio is playing and	stops after it
       sees 10 minutes of silence.

       N.B.  The above is just an overview of SoX's capabilities; detailed ex-
       planations  of how to use all SoX parameters, file formats, and effects
       can be found below in this manual, in soxformat(7), and in soxi(1).

   File	Format Types
       SoX can work with `self-describing' and `raw' audio  files.   `self-de-
       scribing'  formats  (e.g. WAV, FLAC, MP3) have a	header that completely
       describes the signal and	encoding attributes of	the  audio  data  that
       follows.	`raw' or `headerless' formats do not contain this information,
       so the audio characteristics of these must be described on the SoX com-
       mand line or inferred from those	of the input file.

       The  following  four characteristics are	used to	describe the format of
       audio data such that it can be processed	with SoX:

       sample rate
	      The sample rate in samples per second (`Hertz' or	`Hz').	 Digi-
	      tal  telephony  traditionally  uses  a  sample  rate  of 8000 Hz
	      (8 kHz), though these days, 16 and even 32 kHz are becoming more
	      common. Audio Compact Discs use 44100 Hz (44.1 kHz). Digital Au-
	      dio Tape and many	computer systems use 48	kHz. Professional  au-
	      dio systems often	use 96 kHz.

       sample size
	      The  number of bits used to store	each sample.  Today, 16-bit is
	      commonly used. 8-bit was popular in the early days  of  computer
	      audio.  24-bit  is  used	in the professional audio arena. Other
	      sizes are	also used.

       data encoding
	      The way in which each  audio  sample  is	represented  (or  `en-
	      coded').	 Some  encodings have variants with different byte-or-
	      derings or bit-orderings.	 Some compress the audio data so  that
	      the  stored  audio  data takes up	less space (i.e. disk space or
	      transmission bandwidth) than the other format parameters and the
	      number of	samples	would imply.  Commonly-used encoding types in-
	      clude floating-point, <mu>-law, ADPCM, signed-integer PCM,  MP3,
	      and FLAC.

	      The  number  of  audio  channels	contained  in  the  file.  One
	      (`mono') and two (`stereo') are widely used.   `Surround	sound'
	      audio typically contains six or more channels.

       The  term  `bit-rate' is	a measure of the amount	of storage occupied by
       an encoded audio	signal over a unit of time.  It	can depend on  all  of
       the  above and is typically denoted as a	number of kilo-bits per	second
       (kbps).	An A-law telephony signal has a	bit-rate of 64	kbps.  MP3-en-
       coded  stereo  music typically has a bit-rate of	128-196	kbps. FLAC-en-
       coded stereo music typically has	a bit-rate of 550-760 kbps.

       Most self-describing formats also allow textual `comments' to be	embed-
       ded  in	the  file  that	can be used to describe	the audio in some way,
       e.g. for	music, the title, the author, etc.

       One important use of audio file comments	is to convey `Replay Gain' in-
       formation.   SoX	supports applying Replay Gain information (for certain
       input file formats only;	currently, at least FLAC and Ogg Vorbis),  but
       not  generating	it.   Note that	by default, SoX	copies input file com-
       ments to	output files that support comments, so output files  may  con-
       tain Replay Gain	information if some was	present	in the input file.  In
       this case, if anything other than a simple format conversion  was  per-
       formed then the output file Replay Gain information is likely to	be in-
       correct and so should be	recalculated using a tool that	supports  this
       (not SoX).

       The  soxi(1) command can	be used	to display information from audio file

   Determining & Setting The File Format
       There are several mechanisms available for SoX to use to	 determine  or
       set the format characteristics of an audio file.	 Depending on the cir-
       cumstances, individual characteristics may be determined	or  set	 using
       different mechanisms.

       To  determine  the  format  of an input file, SoX will use, in order of
       precedence and as given or available:

       1.  Command-line	format options.

       2.  The contents	of the file header.

       3.  The filename	extension.

       To set the output file format, SoX will use, in order of	precedence and
       as given	or available:

       1.  Command-line	format options.

       2.  The filename	extension.

       3.  The	input file format characteristics, or the closest that is sup-
	   ported by the output	file type.

       For all files, SoX will exit with an error if the file type  cannot  be
       determined. Command-line	format options may need	to be added or changed
       to resolve the problem.

   Playing & Recording Audio
       The play	and rec	commands  are  provided	 so  that  basic  playing  and
       recording is as simple as
	  play existing-file.wav
	  rec new-file.wav
       These two commands are functionally equivalent to
	  sox existing-file.wav	-d
	  sox -d new-file.wav
       Of  course,  further  options  and  effects (as described below)	can be
       added to	the commands in	either form.

				 *	  *	   *

       Some systems provide more  than	one  type  of  (SoX-compatible)	 audio
       driver,	e.g.  ALSA  &  OSS, or SUNAU & AO.  Systems can	also have more
       than one	audio device (a.k.a. `sound card').  If	more  than  one	 audio
       driver  has  been built-in to SoX, and the default selected by SoX when
       recording or playing is not the one that	is  wanted,  then  the	AUDIO-
       DRIVER  environment  variable can be used to override the default.  For
       example (on many	systems):
	  set AUDIODRIVER=oss
	  play ...
       The AUDIODEV environment	variable can be	used to	override  the  default
       audio device, e.g.
	  set AUDIODEV=/dev/dsp2
	  play ...
	  sox ... -t oss
	  set AUDIODEV=hw:soundwave,1,2
	  play ...
	  sox ... -t alsa
       Note  that  the way of setting environment variables varies from	system
       to system - for some specific examples, see `SOX_OPTS' below.

       When playing a file with	a sample rate that is not supported by the au-
       dio  output  device,  SoX  will automatically invoke the	rate effect to
       perform the necessary sample rate conversion.  For  compatibility  with
       old  hardware, the default rate quality level is	set to `low'. This can
       be changed by explicitly	specifying the rate effect  with  a  different
       quality level, e.g.
	  play ... rate	-m
       or by using the --play-rate-arg option (see below).

				 *	  *	   *

       On some systems,	SoX allows audio playback volume to be adjusted	whilst
       using play.  Where supported, this is achieved by tapping the `v' & `V'
       keys during playback.

       To  help	 with setting a	suitable recording level, SoX includes a peak-
       level meter which can be	invoked	(before	making the  actual  recording)
       as follows:
	  rec -n
       The recording level should be adjusted (using the system-provided mixer
       program,	not SoX) so that the meter is at most occasionally full	scale,
       and never `in the red' (an exclamation mark is shown).  See also	-S be-

       Many file formats that compress audio discard some of the audio	signal
       information  whilst doing so. Converting	to such	a format and then con-
       verting back again will not produce an exact copy of the	 original  au-
       dio.   This is the case for many	formats	used in	telephony (e.g.	A-law,
       GSM) where low signal bandwidth is more important than high  audio  fi-
       delity,	and for	many formats used in portable music players (e.g. MP3,
       Vorbis) where adequate fidelity can be retained	even  with  the	 large
       compression ratios that are needed to make portable players practical.

       Formats that discard audio signal information are called	`lossy'.  For-
       mats that do not	are called `lossless'.	The term `quality' is used  as
       a  measure  of  how closely the original	audio signal can be reproduced
       when using a lossy format.

       Audio file conversion with SoX is lossless when it can  be,  i.e.  when
       not  using  lossy  compression,	when not reducing the sampling rate or
       number of channels, and when the	number of bits used in the destination
       format is not less than in the source format.  E.g.  converting from an
       8-bit PCM format	to a 16-bit PCM	format is lossless but converting from
       an 8-bit	PCM format to (8-bit) A-law isn't.

       N.B.   SoX  converts all	audio files to an internal uncompressed	format
       before performing any audio processing. This means that manipulating  a
       file that is stored in a	lossy format can cause further losses in audio
       fidelity.  E.g. with
	  sox long.mp3 short.mp3 trim 10
       SoX first decompresses the input	MP3 file, then applies	the  trim  ef-
       fect, and finally creates the output MP3	file by	re-compressing the au-
       dio - with a possible reduction in fidelity above that  which  occurred
       when  the input file was	created.  Hence, if what is ultimately desired
       is lossily compressed audio, it is highly recommended  to  perform  all
       audio  processing  using	 lossless file formats and then	convert	to the
       lossy format only at the	final stage.

       N.B.  Applying multiple effects with a single SoX invocation  will,  in
       general,	produce	more accurate results than those produced using	multi-
       ple SoX invocations.

       Dithering is a technique	used to	maximise the dynamic  range  of	 audio
       stored  at a particular bit-depth. Any distortion introduced by quanti-
       sation is decorrelated by adding	a small	amount of white	noise  to  the
       signal.	In most	cases, SoX can determine whether the selected process-
       ing requires dither and will add	it during output formatting if	appro-

       Specifically,  by  default, SoX automatically adds TPDF dither when the
       output bit-depth	is less	than 24	and any	of the following are true:

       o   bit-depth reduction has been	specified explicitly using a  command-
	   line	option

       o   the	output file format supports only bit-depths lower than that of
	   the input file format

       o   an effect has increased effective  bit-depth	 within	 the  internal
	   processing chain

       For  example,  adjusting	 volume	 with vol 0.25 requires	two additional
       bits in which to	losslessly  store  its	results	 (since	 0.25  decimal
       equals  0.01 binary).  So if the	input file bit-depth is	16, then SoX's
       internal	representation will utilise 18 bits after processing this vol-
       ume  change.  In	order to store the output at the same depth as the in-
       put, dithering is used to remove	the additional bits.

       Use the -V option to see	what processing	SoX has	 automatically	added.
       The  -D option may be given to override automatic dithering.  To	invoke
       dithering manually (e.g.	to select  a  noise-shaping  curve),  see  the
       dither effect.

       Clipping	is distortion that occurs when an audio	signal level (or `vol-
       ume') exceeds the range of the chosen representation.  In  most	cases,
       clipping	 is  undesirable  and  so should be corrected by adjusting the
       level prior to the point	(in the	processing chain) at which it occurs.

       In SoX, clipping	could occur, as	you might expect, when using  the  vol
       or gain effects to increase the audio volume. Clipping could also occur
       with many other effects,	when converting	one  format  to	 another,  and
       even when simply	playing	the audio.

       Playing an audio	file often involves resampling,	and processing by ana-
       logue components	can introduce a	small DC offset	and/or	amplification,
       all  of which can produce distortion if the audio signal	level was ini-
       tially too close	to the clipping	point.

       For these reasons, it is	usual to make sure that	an audio file's	signal
       level  has  some	`headroom', i.e. it does not exceed a particular level
       below the maximum possible level	for the	 given	representation.	  Some
       standards  bodies recommend as much as 9dB headroom, but	in most	cases,
       3dB (~~ 70% linear) is enough.  Note that this  wisdom  seems  to  have
       been  lost  in  modern  music production; in fact, many CDs, MP3s, etc.
       are now mastered	at levels above	0dBFS i.e. the audio is	clipped	as de-

       SoX's stat and stats effects can	assist in determining the signal level
       in an audio file. The gain or vol effect	can be used to	prevent	 clip-
       ping, e.g.
	  sox dull.wav bright.wav gain -6 treble +6
       guarantees that the treble boost	will not clip.

       If  clipping  occurs at any point during	processing, SoX	will display a
       warning message to that effect.

       See also	-G and the gain	and norm effects.

   Input File Combining
       SoX's input combiner can	be configured (see OPTIONS below)  to  combine
       multiple	 files using any of the	following methods: `concatenate', `se-
       quence',	`mix',	`mix-power',  `merge',	or  `multiply'.	  The  default
       method is `sequence' for	play, and `concatenate'	for rec	and sox.

       For  all	 methods other than `sequence',	multiple input files must have
       the same	sampling rate. If necessary, separate SoX invocations  can  be
       used to make sampling rate adjustments prior to combining.

       If  the	`concatenate' combining	method is selected (usually, this will
       be by default) then the input files must	also have the same  number  of
       channels.   The audio from each input will be concatenated in the order
       given to	form the output	file.

       The `sequence' combining	method is selected automatically for play.  It
       is  similar  to `concatenate' in	that the audio from each input file is
       sent serially to	the output file. However, here the output file may  be
       closed  and  reopened  at  the  corresponding  transition between input
       files. This may be just what is needed when sending different types  of
       audio  to an output device, but is not generally	useful when the	output
       is a normal file.

       If either the `mix' or `mix-power' combining method  is	selected  then
       two  or	more  input  files must	be given and will be mixed together to
       form the	output file.  The number of channels in	each input  file  need
       not  be the same, but SoX will issue a warning if they are not and some
       channels	in the output file will	not contain  audio  from  every	 input
       file.   A  mixed	audio file cannot be un-mixed without reference	to the
       original	input files.

       If the `merge' combining	method is selected  then  two  or  more	 input
       files  must  be	given  and  will be merged together to form the	output
       file.  The number of channels in	each input file	need not be the	 same.
       A merged	audio file comprises all of the	channels from all of the input
       files. Un-merging is possible using multiple invocations	 of  SoX  with
       the  remix effect.  For example,	two mono files could be	merged to form
       one stereo file.	The first and second mono files	would become the  left
       and right channels of the stereo	file.

       The  `multiply' combining method	multiplies the sample values of	corre-
       sponding	channels (treated as numbers in	the interval -1	 to  +1).   If
       the  number of channels in the input files is not the same, the missing
       channels	are considered to contain all zero.

       When combining input files, SoX applies any specified effects  (includ-
       ing, for	example, the vol volume	adjustment effect) after the audio has
       been combined. However, it is often useful to be	able to	set the	volume
       of  (i.e.  `balance')  the  inputs individually,	before combining takes

       For all combining methods, input	file volume adjustments	 can  be  made
       manually	using the -v option (below) which can be given for one or more
       input files. If it is given for only some of the	input files  then  the
       others  receive no volume adjustment.  In some circumstances, automatic
       volume adjustments may be applied (see below).

       The -V option (below) can be used to show the input file	volume adjust-
       ments that have been selected (either manually or automatically).

       There are some special considerations that need to made when mixing in-
       put files:

       Unlike the other	methods, `mix' combining has the  potential  to	 cause
       clipping	 in  the combiner if no	balancing is performed.	 In this case,
       if manual volume	adjustments are	not given, SoX will try	to ensure that
       clipping	 does  not occur by automatically adjusting the	volume (ampli-
       tude) of	each input signal by a factor of ^1/n, where n is  the	number
       of  input  files.  If this results in audio that	is too quiet or	other-
       wise unbalanced then the	input file volumes can be set manually as  de-
       scribed above. Using the	norm effect on the mix is another alternative.

       If mixed	audio seems loud enough	at some	points but too quiet in	others
       then dynamic range compression should be	applied	to correct this	-  see
       the compand effect.

       With  the `mix-power' combine method, the mixed volume is approximately
       equal to	that of	one of the input signals.  This	is achieved by balanc-
       ing  using a factor of ^1/<sqrt>n instead of ^1/n.  Note	that this bal-
       ancing factor does not guarantee	that clipping will not occur, but  the
       number  of  clips  will	usually	be low and the resultant distortion is
       generally imperceptible.

   Output Files
       SoX's default behaviour is to take one or more input  files  and	 write
       them to a single	output file.

       This behaviour can be changed by	specifying the pseudo-effect `newfile'
       within the effects list.	 SoX will then enter multiple output mode.

       In multiple output mode,	a new file is created when the	effects	 prior
       to  the `newfile' indicate they are done.  The effects chain listed af-
       ter `newfile' is	then started up	and its	output is  saved  to  the  new

       In multiple output mode,	a unique number	will automatically be appended
       to the end of all filenames.  If	the filename has an extension then the
       number is inserted before the extension.	 This behaviour	can be custom-
       ized by placing a %n anywhere in	the filename where the	number	should
       be  substituted.	 An optional number can	be placed after	the % to indi-
       cate a minimum fixed width for the number.

       Multiple	output mode is not very	useful unless an effect	that will stop
       the  effects  chain  early is specified before the `newfile'. If	end of
       file is reached before the effects chain	stops itself then no new  file
       will be created as it would be empty.

       The following is	an example of splitting	the first 60 seconds of	an in-
       put file	into two 30 second files and ignoring the rest.
	  sox song.wav ringtone%1n.wav trim 0 30 : newfile : trim 0 30

   Stopping SoX
       Usually SoX will	complete its processing	and exit automatically once it
       has read	all available audio data from the input	files.

       If desired, it can be terminated	earlier	by sending an interrupt	signal
       to the process (usually by pressing the keyboard	interrupt key which is
       normally	Ctrl-C).  This is a natural requirement	in some	circumstances,
       e.g. when using SoX to make a recording.	 Note that when	using  SoX  to
       play  multiple  files, Ctrl-C behaves slightly differently: pressing it
       once causes SoX to skip to the next file; pressing it  twice  in	 quick
       succession causes SoX to	exit.

       Another	option to stop processing early	is to use an effect that has a
       time period or sample count to determine	the stopping point.  The  trim
       effect  is  an  example	of this.  Once all effects chains have stopped
       then SoX	will also stop.

       Filenames can be	simple file names, absolute or relative	path names, or
       URLs  (input  files only).  Note	that URL support requires that wget(1)
       is available.

       Note: Giving SoX	an input or output filename that is the	same as	a  SoX
       effect-name  will  not  work  since  SoX	 will  treat  it  as an	effect
       specification.	The  only  work-around	to  this  is  to  avoid	  such
       filenames.  This	 is generally not difficult since most audio filenames
       have a filename `extension', whilst effect-names	do not.

   Special Filenames
       The following special filenames may be used in certain circumstances in
       place of	a normal filename on the command line:

       -      SoX  can	be  used  in  simple  pipeline operations by using the
	      special filename `-' which, if used as an	input  filename,  will
	      cause  SoX  will	read audio data	from `standard input' (stdin),
	      and which, if used as the	output filename, will cause  SoX  will
	      send  audio  data	to `standard output' (stdout).	Note that when
	      using this option	for the	output file, and sometimes when	 using
	      it  for an input file, the file-type (see	-t below) must also be

       "|program [options] ..."
	      This can be used in place	of an input filename  to  specify  the
	      the given	program's standard output (stdout) be used as an input
	      file.  Unlike - (above), this can	be used	for several inputs  to
	      one  SoX	command.   For	example,  if `genw' generates mono WAV
	      formatted	signals	to its standard	 output,  then	the  following
	      command makes a stereo file from two generated signals:
		 sox -M	"|genw --imd -"	"|genw --thd -"	out.wav
	      For  headerless  (raw)  audio,  -t (and perhaps other format op-
	      tions) will need to be given, preceding the input	command.

	      Specifies	that filename `globbing' (wild-card  matching)	should
	      be performed by SoX instead of by	the shell.  This allows	a sin-
	      gle set of file options to be applied to a group of files.   For
	      example,	if  the	 current directory contains three `vox'	files,
	      file1.vox, file2.vox, and	file3.vox, then
		 play --rate 6k	*.vox
	      will be expanded by the `shell' (in most environments) to
		 play --rate 6k	file1.vox file2.vox file3.vox
	      which will treat only the	first vox file as having a sample rate
	      of 6k.  With
		 play --rate 6k	"*.vox"
	      the  given  sample  rate option will be applied to all three vox

       -p, --sox-pipe
	      This can be used in place	of an output filename to specify  that
	      the  SoX	command	should be used as in input pipe	to another SoX
	      command.	For example, the command:
		 play "|sox -n -p synth	2" "|sox -n -p synth 2 tremolo 10" stat
	      plays two	`files'	in succession, each with different effects.

	      -p is in fact an alias for `-t sox -'.

       -d, --default-device
	      This can be used in place	of an  input  or  output  filename  to
	      specify  that  the  default  audio device	(if one	has been built
	      into SoX)	is to be used.	This is	akin to	invoking rec  or  play
	      (as described above).

       -n, --null
	      This  can	 be  used  in  place of	an input or output filename to
	      specify that a `null file' is to be used.	 Note that here, `null
	      file'  refers  to	a SoX-specific mechanism and is	not related to
	      any operating-system mechanism with a similar name.

	      Using a null file	to input audio is equivalent to	using a	normal
	      audio  file  that	contains an infinite amount of silence,	and as
	      such is not generally useful unless used	with  an  effect  that
	      specifies	a finite time length (such as trim or synth).

	      Using  a null file to output audio amounts to discarding the au-
	      dio and is useful	mainly with effects that  produce  information
	      about  the  audio	 instead of affecting it (such as noiseprof or

	      The sampling rate	associated with	a  null	 file  is  by  default
	      48 kHz,  but,  as	 with a	normal file, this can be overridden if
	      desired using command-line format	options	(see below).

   Supported File & Audio Device Types
       See soxformat(7)	for a list and description of the supported file  for-
       mats and	audio device drivers.

   Global Options
       These  options can be specified on the command line at any point	before
       the first effect	name.

       The SOX_OPTS environment	variable can be	used  to  provide  alternative
       default values for SoX's	global options.	 For example:
	  SOX_OPTS="--buffer 20000 --play-rate-arg -hs --temp /mnt/temp"
       Note  that  setting SOX_OPTS can	potentially create unwanted changes in
       the behaviour of	scripts	or other programs that invoke  SoX.   SOX_OPTS
       might  best  be used for	things (such as	in the given example) that re-
       flect the environment in	which SoX is being run.	 Enabling options such
       as  --no-clobber	as default might be handled better using a shell alias
       since a shell alias will	not affect operation in	scripts	etc.

       One way to ensure that a	script cannot be affected by  SOX_OPTS	is  to
       clear SOX_OPTS at the start of the script, but this of course loses the
       benefit of SOX_OPTS carrying some system-wide default options.  An  al-
       ternative approach is to	explicitly invoke SoX with default option val-
       ues, e.g.
	  SOX_OPTS="-V --no-clobber"
	  sox -V2 --clobber $input $output ...
       Note that the way to set	environment variables varies  from  system  to
       system. Here are	some examples:

       Unix bash:
	  export SOX_OPTS="-V --no-clobber"
       Unix csh:
	  setenv SOX_OPTS "-V --no-clobber"
	  set SOX_OPTS=-V --no-clobber
       MS-Windows  GUI:	 via  Control  Panel : System :	Advanced : Environment

       Mac OS X	GUI: Refer to Apple's Technical	Q&A QA1067 document.

       --buffer	BYTES, --input-buffer BYTES
	      Set the size in bytes of the buffers used	for  processing	 audio
	      (default	8192).	--buffer applies to input, effects, and	output
	      processing; --input-buffer applies only to input processing (for
	      which it overrides --buffer if both are given).

	      Be aware that large values for --buffer will cause SoX to	be be-
	      come slow	to respond to requests to terminate  or	 to  skip  the
	      current input file.

	      Don't  prompt  before overwriting	an existing file with the same
	      name as that given for the output	file.  This is the default be-

       --combine concatenate|merge|mix|mix-power|multiply|sequence
	      Select the input file combining method; for some of these, short
	      options are available: -m	selects	`mix', -M selects `merge', and
	      -T selects `multiply'.

	      See  Input File Combining	above for a description	of the differ-
	      ent combining methods.

       -D, --no-dither
	      Disable automatic	dither - see `Dithering' above.	 An example of
	      why this might occasionally be useful is if a file has been con-
	      verted from 16 to	24 bit with the	intention of doing  some  pro-
	      cessing on it, but in fact no processing is needed after all and
	      the original 16 bit file has been	lost, then, strictly speaking,
	      no  dither is needed if converting the file back to 16 bit.  See
	      also the stats effect for	how to determine the actual bit	 depth
	      of the audio within a file.

       --effects-file FILENAME
	      Use  FILENAME  to	 obtain	 all effects and their arguments.  The
	      file is parsed as	if the values were specified  on  the  command
	      line.   A	 new line can be used in place of the special :	marker
	      to separate effect chains.  For convenience, such	markers	at the
	      end  of the file are normally ignored; if	you want to specify an
	      empty last effects chain,	use an explicit	:  by  itself  on  the
	      last line	of the file.  This option causes any effects specified
	      on the command line to be	discarded.

       -G, --guard
	      Automatically invoke the gain effect to guard against  clipping.
		 sox -G	infile -b 16 outfile rate 44100	dither -s
	      is shorthand for
		 sox infile -b 16 outfile gain -h rate 44100 gain -rh dither -s
	      See also -V, --norm, and the gain	effect.

       -h, --help
	      Show version number and usage information.

       --help-effect NAME
	      Show  usage  information	on the specified effect.  The name all
	      can be used to show usage	on all effects.

       --help-format NAME
	      Show information about the specified file	format.	 The name  all
	      can be used to show information on all formats.

       --i, --info
	      Only if given as the first parameter to sox, behave as soxi(1).

       -m|-M  Equivalent to --combine mix and --combine	merge, respectively.

	      If  SoX has been built with the optional `libmagic' library then
	      this option can be given to enable its use in helping to	detect
	      audio file types.

       --multi-threaded	| --single-threaded
	      By  default,  SoX	is `single threaded'.  If the --multi-threaded
	      option is	given however then SoX will process audio channels for
	      most multi-channel effects in parallel on	hyper-threading/multi-
	      core architectures. This	may  reduce  processing	 time,	though
	      sometimes	 it may	be necessary to	use this option	in conjunction
	      with a larger buffer size	than is	the default to gain any	 bene-
	      fit  from	 multi-threaded	 processing (e.g. 131072; see --buffer

	      Prompt before overwriting	an existing file with the same name as
	      that given for the output	file.

	      N.B.   Unintentionally  overwriting  a  file  is easier than you
	      might think, for example,	if you accidentally enter
		 sox file1 file2 effect1 effect2 ...
	      when what	you really meant was
		 play file1 file2 effect1 effect2 ...
	      then, without this option, file2 will  be	 overwritten.	Hence,
	      using  this  option  is recommended. SOX_OPTS (above), a `shell'
	      alias, script, or	batch file may be an appropriate way of	perma-
	      nently enabling it.

	      Automatically  invoke  the gain effect to	guard against clipping
	      and to normalise the audio. E.g.
		 sox --norm infile -b 16 outfile rate 44100 dither -s
	      is shorthand for
		 sox infile -b 16 outfile gain -h rate 44100 gain -nh dither -s
	      Optionally, the audio can	be normalized to a given  level	 (usu-
	      ally) below 0 dBFS:
		 sox --norm=-3 infile outfile

	      See also -V, -G, and the gain effect.

       --play-rate-arg ARG
	      Selects  a  quality  option to be	used when the `rate' effect is
	      automatically invoked whilst playing audio.  This	option is typ-
	      ically set via the SOX_OPTS environment variable (see above).

       --plot gnuplot|octave|off
	      If not set to off	(the default if	--plot is not given), run in a
	      mode that	can be used, in	conjunction with the  gnuplot  program
	      or the GNU Octave	program, to assist with	the selection and con-
	      figuration of many of the	transfer-function based	effects.   For
	      the  first given effect that supports the	selected plotting pro-
	      gram, SoX	will output commands to	 plot  the  effect's  transfer
	      function,	 and  then exit	without	actually processing any	audio.
		 sox --plot octave input-file -n highpass 1320 > highpass.plt
		 octave	highpass.plt

       -q, --no-show-progress
	      Run in quiet mode	when SoX wouldn't otherwise do	so.   This  is
	      the opposite of the -S option.

       -R     Run  in `repeatable' mode.  When this option is given, where ap-
	      plicable,	SoX will embed a fixed time-stamp in the  output  file
	      (e.g.   AIFF)  and  will	`seed' pseudo random number generators
	      (e.g.  dither) with a fixed number, thus ensuring	 that  succes-
	      sive  SoX	 invocations with the same inputs and the same parame-
	      ters yield the same output.

       --replay-gain track|album|off
	      Select whether or	not to apply replay-gain adjustment  to	 input
	      files.  The default is off for sox and rec, album	for play where
	      (at least) the first two input files are tagged  with  the  same
	      Artist and Album names, and track	for play otherwise.

       -S, --show-progress
	      Display  input  file  format/header  information,	and processing
	      progress as input	file(s)	percentage complete, elapsed time, and
	      remaining	 time (if known; shown in brackets), and the number of
	      samples written to the output file.  Also	shown is a  peak-level
	      meter,  and  an  indication if clipping has occurred.  The peak-
	      level meter shows	up to two channels and is calibrated for digi-
	      tal audio	as follows (right channel shown):

			    dB FSD   Display   dB FSD	Display
			     -25     -		-11	====
			     -23     =		 -9	====-
			     -21     =-		 -7	=====
			     -19     ==		 -5	=====-
			     -17     ==-	 -3	======
			     -15     ===	 -1	=====!
			     -13     ===-

	      A	 three-second peak-held	value of headroom in dBs will be shown
	      to the right of the meter	if this	is below 6dB.

	      This option is enabled by	default	when  using  SoX  to  play  or
	      record audio.

       -T     Equivalent to --combine multiply.

       --temp DIRECTORY
	      Specify  that any	temporary files	should be created in the given
	      DIRECTORY.  This can be useful if	there are permission or	 free-
	      space  problems  with  the default location. In this case, using
	      `--temp .' (to use the current directory)	is often a good	 solu-

	      Show SoX's version number	and exit.

	      Set  verbosity.  This  is	particularly useful for	seeing how any
	      automatic	effects	have been invoked by SoX.

	      SoX displays messages on the console (stderr) according  to  the
	      following	verbosity levels:

	      0	     No	 messages are shown at all; use	the exit status	to de-
		     termine if	an error has occurred.

	      1	     Only error	messages are shown.  These  are	 generated  if
		     SoX cannot	complete the requested commands.

	      2	     Warning  messages are also	shown.	These are generated if
		     SoX can complete the requested commands, but not  exactly
		     according	to  the	 requested  command  parameters, or if
		     clipping occurs.

	      3	     Descriptions of SoX's processing phases are  also	shown.
		     Useful  for seeing	exactly	how SoX	is processing your au-

	      4	and above
		     Messages to help with debugging SoX are also shown.

	      By default, the verbosity	level is set to	2  (shows  errors  and
	      warnings).  Each	occurrence of the -V option increases the ver-
	      bosity level by 1.  Alternatively, the verbosity	level  can  be
	      set to an	absolute number	by specifying it immediately after the
	      -V, e.g.	-V0 sets it to 0.

   Input File Options
       These options apply only	to input files	and  may  precede  only	 input
       filenames on the	command	line.

	      Override	an  (incorrect)	 audio length given in an audio	file's
	      header. If this option is	given then SoX will keep reading audio
	      until it reaches the end of the input file.

       -v, --volume FACTOR
	      Intended	for  use when combining	multiple input files, this op-
	      tion adjusts the volume of the file that follows it on the  com-
	      mand line	by a factor of FACTOR. This allows it to be `balanced'
	      w.r.t. the other input files.  This is a linear (amplitude)  ad-
	      justment,	 so  a	number	less than 1 decreases the volume and a
	      number greater than 1 increases it.  If  a  negative  number  is
	      given  then in addition to the volume adjustment,	the audio sig-
	      nal will be inverted.

	      See also the norm, vol, and gain effects,	 and  see  Input  File
	      Balancing	above.

   Input & Output File Format Options
       These options apply to the input	or output file whose name they immedi-
       ately precede on	the command line and are used mainly when working with
       headerless file formats or when specifying a format for the output file
       that is different to that of the	input file.

       -b BITS,	--bits BITS
	      The number of bits (a.k.a. bit-depth or  sometimes  word-length)
	      in  each	encoded	 sample.   Not applicable to complex encodings
	      such as MP3 or GSM.  Not necessary with encodings	 that  have  a
	      fixed number of bits, e.g.  A/<mu>-law, ADPCM.

	      For an input file, the most common use for this option is	to in-
	      form SoX of the number of	bits per sample	in a  `raw'  (`header-
	      less') audio file.  For example
		 sox -r	16k -e signed -b 8 input.raw output.wav
	      converts	a  particular  `raw'  file  to a self-describing `WAV'

	      For an output file, this option can be used (perhaps along  with
	      -e)  to  set the output encoding size.  By default (i.e. if this
	      option is	not given), the	output encoding	size  will  (providing
	      it is supported by the output file type) be set to the input en-
	      coding size.  For	example
		 sox input.cdda	-b 24 output.wav
	      converts raw CD digital  audio  (16-bit,	signed-integer)	 to  a
	      24-bit (signed-integer) `WAV' file.

       -c CHANNELS, --channels CHANNELS
	      The  number of audio channels in the audio file. This can	be any
	      number greater than zero.

	      For an input file, the most common use for this option is	to in-
	      form SoX of the number of	channels in a `raw' (`headerless') au-
	      dio file.	 Occasionally, it may be useful	 to  use  this	option
	      with a `headered'	file, in order to override the (presumably in-
	      correct) value in	the header - note that this is only  supported
	      with certain file	types.	Examples:
		 sox -r	48k -e float -b	32 -c 2	input.raw output.wav
	      converts	a  particular  `raw'  file  to a self-describing `WAV'
		 play -c 1 music.wav
	      interprets the file data as belonging to a  single  channel  re-
	      gardless	of what	is indicated in	the file header.  Note that if
	      the file does in fact have two channels, this will result	in the
	      file playing at half speed.

	      For  an output file, this	option provides	a shorthand for	speci-
	      fying that the channels effect should be	invoked	 in  order  to
	      change (if necessary) the	number of channels in the audio	signal
	      to the number given.  For	example, the  following	 two  commands
	      are equivalent:
		 sox input.wav -c 1 output.wav bass -b 24
		 sox input.wav	    output.wav bass -b 24 channels 1
	      though the second	form is	more flexible as it allows the effects
	      to be ordered arbitrarily.

       -e ENCODING, --encoding ENCODING
	      The audio	encoding type.	Sometimes needed with file-types  that
	      support more than	one encoding type. For example,	with raw, WAV,
	      or AU (but not, for example, with	MP3 or FLAC).	The  available
	      encoding types are as follows:

		     PCM  data stored as signed	(`two's	complement') integers.
		     Commonly used with	a 16 or	 24  -bit  encoding  size.   A
		     value of 0	represents minimum signal power.

		     PCM data stored as	unsigned integers.  Commonly used with
		     an	8-bit encoding size.  A	value of 0 represents  maximum
		     signal power.

		     PCM  data stored as IEEE 753 single precision (32-bit) or
		     double precision (64-bit)	floating-point	(`real')  num-
		     bers.  A value of 0 represents minimum signal power.

	      a-law  International telephony standard for logarithmic encoding
		     to	8 bits per sample.  It has a precision	equivalent  to
		     roughly 13-bit PCM	and is sometimes encoded with reversed
		     bit-ordering (see the -X option).

	      u-law, mu-law
		     North American telephony standard for logarithmic	encod-
		     ing  to  8	 bits  per sample.  A.k.a. <mu>-law.  It has a
		     precision equivalent to roughly 14-bit PCM	and  is	 some-
		     times  encoded with reversed bit-ordering (see the	-X op-

		     OKI (a.k.a. VOX, Dialogic,	or Intel) 4-bit	ADPCM; it  has
		     a precision equivalent to roughly 12-bit PCM.  ADPCM is a
		     form of audio compression that has	a good compromise  be-
		     tween audio quality and encoding/decoding speed.

		     IMA  (a.k.a. DVI) 4-bit ADPCM; it has a precision equiva-
		     lent to roughly 13-bit PCM.

		     Microsoft 4-bit ADPCM; it has a precision	equivalent  to
		     roughly 14-bit PCM.

		     GSM  is  currently	 used  for  the	 vast  majority	of the
		     world's digital wireless telephone	 calls.	  It  utilises
		     several  audio formats with different bit-rates and asso-
		     ciated speech quality.  SoX has support for GSM's	origi-
		     nal  13kbps `Full Rate' audio format.  It is usually CPU-
		     intensive to work with GSM	audio.

	      Encoding names can be abbreviated	where this would  not  be  am-
	      biguous;	e.g.  `unsigned-integer' can be	given as `un', but not
	      `u' (ambiguous with `u-law').

	      For an input file, the most common use for this option is	to in-
	      form  SoX	 of  the encoding of a `raw' (`headerless') audio file
	      (see the examples	in -b and -c above).

	      For an output file, this option can be used (perhaps along  with
	      -b) to set the output encoding type  For example
		 sox input.cdda	-e float output1.wav

		 sox input.cdda	-b 64 -e float output2.wav
	      convert  raw CD digital audio (16-bit, signed-integer) to	float-
	      ing-point	`WAV' files (single & double precision respectively).

	      By default (i.e. if this option is not given), the output	encod-
	      ing  type	 will  (providing  it  is supported by the output file
	      type) be set to the input	encoding type.

	      Specifies	that filename `globbing' (wild-card  matching)	should
	      not be performed by SoX on the following filename.  For example,
	      if the current  directory	 contains  the	two  files  `five-sec-
	      onds.wav'	and `five*.wav', then
		 play --no-glob	"five*.wav"
	      can be used to play just the single file `five*.wav'.

       -r, --rate RATE[k]
	      Gives the	sample rate in Hz (or kHz if appended with `k')	of the

	      For an input file, the most common use for this option is	to in-
	      form SoX of the sample rate of a `raw' (`headerless') audio file
	      (see the examples	in -b and -c above).  Occasionally it  may  be
	      useful  to  use  this option with	a `headered' file, in order to
	      override the (presumably incorrect) value	in the header  -  note
	      that  this is only supported with	certain	file types.  For exam-
	      ple, if audio was	recorded with a	sample-rate of say 48k from  a
	      source that played back a	little,	say 1.5%, too slowly, then
		 sox -r	48720 input.wav	output.wav
	      effectively  corrects the	speed by changing only the file	header
	      (but see also the	speed effect for the more  usual  solution  to
	      this problem).

	      For  an output file, this	option provides	a shorthand for	speci-
	      fying that the rate effect should	be invoked in order to	change
	      (if  necessary) the sample rate of the audio signal to the given
	      value.  For example, the following two commands are equivalent:
		 sox input.wav -r 48k output.wav bass -b 24
		 sox input.wav	      output.wav bass -b 24 rate 48k
	      though the second	form is	more flexible as it  allows  rate  op-
	      tions  to	 be  given, and	allows the effects to be ordered arbi-

       -t, --type FILE-TYPE
	      Gives the	type of	the audio file.	 For  both  input  and	output
	      files,  this option is commonly used to inform SoX of the	type a
	      `headerless' audio file (e.g. raw, mp3) where the	actual/desired
	      type  cannot be determined from a	given filename extension.  For
		 another-command | sox -t mp3 -	output.wav

		 sox input.wav -t raw output.bin
	      It can also be used to override the type	implied	 by  an	 input
	      filename	extension,  but	 if  overriding	with a type that has a
	      header, SoX will exit with an appropriate	error message if  such
	      a	header is not actually present.

	      See soxformat(7) for a list of supported file types.

       -L, --endian little
       -B, --endian big
       -x, --endian swap
	      These  options  specify whether the byte-order of	the audio data
	      is, respectively,	`little	endian', `big endian', or the opposite
	      to  that	of  the	system on which	SoX is being used.  Endianness
	      applies only to data encoded as floating-point, or as signed  or
	      unsigned	integers of 16 or more bits.  It is often necessary to
	      specify one of these options for headerless files, and sometimes
	      necessary	 for  (otherwise)  self-describing files.  A given en-
	      dian-setting option may be  ignored  for	an  input  file	 whose
	      header contains a	specific endianness identifier,	or for an out-
	      put file that is actually	an audio device.

	      N.B.  Unlike other format	characteristics, the endianness	(byte,
	      nibble,  &  bit ordering)	of the input file is not automatically
	      used for the output file;	so, for	example, when the following is
	      run on a little-endian system:
		 sox -B	audio.s16 trimmed.s16 trim 2
	      trimmed.s16 will be created as little-endian;
		 sox -B	audio.s16 -B trimmed.s16 trim 2
	      must be used to preserve big-endianness in the output file.

	      The -V option can	be used	to check the selected orderings.

       -N, --reverse-nibbles
	      Specifies	that the nibble	ordering (i.e. the 2 halves of a byte)
	      of the samples should be reversed; sometimes useful with	ADPCM-
	      based formats.

	      N.B.  See	also N.B. in section on	-x above.

       -X, --reverse-bits
	      Specifies	 that  the  bit	 ordering of the samples should	be re-
	      versed; sometimes	useful with a few (mostly headerless) formats.

	      N.B.  See	also N.B. in section on	-x above.

   Output File Format Options
       These options apply only	to the output file and may  precede  only  the
       output filename on the command line.

       --add-comment TEXT
	      Append a comment in the output file header (where	applicable).

       --comment TEXT
	      Specify  the  comment  text  to  store in	the output file	header
	      (where applicable).

	      SoX will provide a default comment if  this  option  (or	--com-
	      ment-file)  is  not  given. To specify that no comment should be
	      stored in	the output file, use --comment "" .

       --comment-file FILENAME
	      Specify a	file containing	the comment text to store in the  out-
	      put file header (where applicable).

       -C, --compression FACTOR
	      The compression factor for variably compressing output file for-
	      mats.  If	this option is not given then  a  default  compression
	      factor  will  apply.  The	compression factor is interpreted dif-
	      ferently for different compressing file formats.	 See  the  de-
	      scription	 of  the  file formats that use	this option in soxfor-
	      mat(7) for more information.

       In addition to converting, playing and recording	audio files,  SoX  can
       be used to invoke a number of audio `effects'.  Multiple	effects	may be
       applied by specifying them one after another at the end of the SoX com-
       mand line, forming an `effects chain'.  Note that applying multiple ef-
       fects in	real-time (i.e.	when playing audio) is	likely	to  require  a
       high  performance  computer.  Stopping other applications may alleviate
       performance issues should they occur.

       Some of the SoX effects are primarily intended to be applied to a  sin-
       gle  instrument	or  `voice'.  To facilitate this, the remix effect and
       the global SoX option -M	can be used to isolate then  recombine	tracks
       from a multi-track recording.

   Multiple Effects Chains
       A  single  effects chain	is made	up of one or more effects.  Audio from
       the input runs through the chain	until either the end of	the input file
       is reached or an	effect in the chain requests to	terminate the chain.

       SoX  supports running multiple effects chains over the input audio.  In
       this case, when one chain indicates it is done  processing  audio,  the
       audio data is then sent through the next	effects	chain.	This continues
       until either no more effects chains exist or the	input has reached  the
       end of the file.

       An  effects chain is terminated by placing a : (colon) after an effect.
       Any following effects are a part	of a new effects chain.

       It is important to place	the effect that	will stop  the	chain  as  the
       first  effect  in  the  chain.	This  is  because any samples that are
       buffered	by effects to the left of the terminating effect will be  dis-
       carded.	The amount of samples discarded	is related to the --buffer op-
       tion and	it should be kept small, relative to the sample	rate,  if  the
       terminating  effect  cannot  be first.  Further information on stopping
       effects can be found in the Stopping SoX	section.

       There are a few pseudo-effects that aid using multiple effects  chains.
       These include newfile which will	start writing to a new output file be-
       fore moving to the next effects chain and restart which will move  back
       to  the	first  effects chain.  Pseudo-effects must be specified	as the
       first effect in a chain and as the only effect in a  chain  (they  must
       have a :	before and after they are specified).

       The  following is an example of multiple	effects	chains.	 It will split
       the input file into multiple files of 30	seconds	in length.  Each  out-
       put  filename  will have	unique number in its name as documented	in the
       Output Files section.
	  sox infile.wav output.wav trim 0 30 :	newfile	: restart

   Common Notation And Parameters
       In the descriptions that	follow,	brackets [ ] are used to denote	param-
       eters  that  are	optional, braces { } to	denote those that are both op-
       tional and repeatable, and angle	brackets < > to	denote those that  are
       repeatable  but not optional.  Where applicable,	default	values for op-
       tional parameters are shown in parenthesis ( ).

       The following parameters	are used with, and have	the same meaning  for,
       several effects:

	      See frequency.

	      A	frequency in Hz, or, if	appended with `k', kHz.

       gain   A	power gain in dB.  Zero	gives no gain; less than zero gives an

	      A	position within	the audio stream; the syntax  is  [=|+|-]time-
	      spec,  where  timespec is	a time specification (see below).  The
	      optional first character indicates whether the timespec is to be
	      interpreted relative to the start	(=) or end (-) of audio, or to
	      the previous position if the effect  accepts  multiple  position
	      arguments	 (+).  The audio length	must be	known for end-relative
	      locations	to work; some effects do accept	-0  for	 end-of-audio,
	      though,  even if the length is unknown.  Which of	=, +, -	is the
	      default depends on the effect and	is shown  in  its  syntax  as,
	      e.g., position(+).

	      Examples:	 =2:00 (two minutes into the audio stream), -100s (one
	      hundred samples before the end of	audio),	+0:12+10s (twelve sec-
	      onds  and	ten samples after the previous position), -0.5+1s (one
	      sample less than half a second before the	end of audio).

	      Used to specify the band-width of	a filter.  A number of differ-
	      ent  methods  to specify the width are available (though not all
	      for every	effect).  One of the characters	shown may be  appended
	      to select	the desired method as follows:

					Method	  Notes
				   h	  Hz
				   k	 kHz
				   o   Octaves
				   q   Q-factor	  See [2]

	      For  each	 effect	 that  uses this parameter, the	default	method
	      (i.e. if no character is appended) is the	 one  that  it	listed
	      first in the first line of the effect's description.

       Most  effects that expect an audio position or duration in a parameter,
       i.e. a time specification, accept either	of the following two forms:

	      A	specification of `1:30.5' corresponds to  one  minute,	thirty
	      and  1/2	seconds.   The t suffix	is entirely optional (however,
	      see the silence effect for an exception).	 Note that the	compo-
	      nent  values  do	not  have  to  be normalized; e.g., `1:23:45',
	      `83:45', `79:0285', `1:0:1425', `1::1425'	and `5025' all are le-
	      gal and equivalent to each other.

	      Specifies	 the  number  of samples directly, as in `8000s'.  For
	      large sample counts, e notation is supported:  `1.7e6s'  is  the
	      same as `1700000s'.

       Time  specifications  can  also	be chained with	+ or - into a new time
       specification where the right part is added to or subtracted  from  the
       left,  respectively:  `3:00-200s'  means	 two hundred samples less than
       three minutes.

       To see if SoX has support for an	optional effect, enter sox -h and look
       for its name under the list: `EFFECTS'.

   Supported Effects
       Note:  a	categorised list of the	effects	can be found in	the accompany-
       ing `README' file.

       allpass frequency[k] width[h|k|o|q]
	      Apply a two-pole all-pass	filter with central frequency (in  Hz)
	      frequency,  and  filter-width width.  An all-pass	filter changes
	      the audio's frequency to phase relationship without changing its
	      frequency	to amplitude relationship.  The	filter is described in
	      detail in	[1].

	      This effect supports the --plot global option.

       band [-n] center[k] [width[h|k|o|q]]
	      Apply a band-pass	filter.	 The frequency	response  drops	 loga-
	      rithmically  around  the	center frequency.  The width parameter
	      gives the	slope of the drop.  The	frequencies at center +	 width
	      and  center  -  width will be half of their original amplitudes.
	      band defaults to a mode oriented to pitched audio,  i.e.	voice,
	      singing,	or instrumental	music.	The -n (for noise) option uses
	      the alternate  mode  for	un-pitched  audio  (e.g.  percussion).
	      Warning: -n introduces a power-gain of about 11dB	in the filter,
	      so beware	of output clipping.   band  introduces	noise  in  the
	      shape  of	 the  filter, i.e. peaking at the center frequency and
	      settling around it.

	      This effect supports the --plot global option.

	      See also sinc for	a bandpass filter with steeper shoulders.

       bandpass|bandreject [-c]	frequency[k] width[h|k|o|q]
	      Apply a two-pole Butterworth  band-pass  or  band-reject	filter
	      with  central  frequency	frequency,  and	(3dB-point) band-width
	      width.  The -c option applies only to  bandpass  and  selects  a
	      constant skirt gain (peak	gain = Q) instead of the default: con-
	      stant 0dB	peak gain.  The	filters	roll off  at  6dB  per	octave
	      (20dB per	decade)	and are	described in detail in [1].

	      These effects support the	--plot global option.

	      See also sinc for	a bandpass filter with steeper shoulders.

       bandreject frequency[k] width[h|k|o|q]
	      Apply a band-reject filter.  See the description of the bandpass
	      effect for details.

       bass|treble gain	[frequency[k] [width[s|h|k|o|q]]]
	      Boost or cut the bass (lower) or treble (upper)  frequencies  of
	      the audio	using a	two-pole shelving filter with a	response simi-
	      lar to that of a standard	hi-fi's	tone-controls.	This  is  also
	      known as shelving	equalisation (EQ).

	      gain  gives  the	gain  at  0 Hz (for bass), or whichever	is the
	      lower of ~22 kHz and the Nyquist frequency  (for	treble).   Its
	      useful  range is about -20 (for a	large cut) to +20 (for a large
	      boost).  Beware of Clipping when using a positive	gain.

	      If desired, the filter can be fine-tuned using the following op-
	      tional parameters:

	      frequency	sets the filter's central frequency and	so can be used
	      to extend	or reduce the frequency	range to be  boosted  or  cut.
	      The default value	is 100 Hz (for bass) or	3 kHz (for treble).

	      width determines how steep is the	filter's shelf transition.  In
	      addition to the common  width  specification  methods  described
	      above,  `slope'  (the  default,  or if appended with `s')	may be
	      used.  The useful	range of `slope' is about 0.3,	for  a	gentle
	      slope,  to 1 (the	maximum), for a	steep slope; the default value
	      is 0.5.

	      The filters are described	in detail in [1].

	      These effects support the	--plot global option.

	      See also equalizer for a peaking equalisation effect.

       bend   [-f   frame-rate(25)]   [-o   over-sample(16)]   {   start-posi-
       tion(+),cents,end-position(+) }
	      Changes  pitch  by  specified  amounts at	specified times.  Each
	      given triple:  start-position,cents,end-position	specifies  one
	      bend.   cents is the number of cents (100	cents =	1 semitone) by
	      which to bend the	pitch. The other values	specify	the points  in
	      time at which to start and end bending the pitch,	respectively.

	      The pitch-bending	algorithm utilises the Discrete	Fourier	Trans-
	      form (DFT) at a particular frame rate  and  over-sampling	 rate.
	      The  -f and -o parameters	may be used to adjust these parameters
	      and thus control the smoothness of the changes in	pitch.

	      For example, an initial  tone  is	 generated,  then  bent	 three
	      times, yielding four different notes in total:
		 play -n synth 2.5 sin 667 gain	1 \
		   bend	.35,180,.25  .15,740,.53  0,-520,.3
	      Here,  the  first	bend runs from 0.35 to 0.6, and	the second one
	      from 0.75	to 1.28	seconds.  Note that the	clipping that is  pro-
	      duced  in	 this example is deliberate; to	remove it, use gain -5
	      in place of gain 1.

	      See also pitch.

       biquad b0 b1 b2 a0 a1 a2
	      Apply a biquad IIR filter	with the given coefficients. Where  b*
	      and  a*  are  the	numerator and denominator coefficients respec-

	      See (where a0
	      =	1).

	      This effect supports the --plot global option.

       channels	CHANNELS
	      Invoke  a	 simple	 algorithm to change the number	of channels in
	      the audio	signal to the given number  CHANNELS:  mixing  if  de-
	      creasing the number of channels or duplicating if	increasing the
	      number of	channels.

	      The channels effect is invoked automatically if SoX's -c	option
	      specifies	 a number of channels that is different	to that	of the
	      input file(s).  Alternatively, if	this effect is	given  explic-
	      itly,  then SoX's	-c option need not be given.  For example, the
	      following	two commands are equivalent:
		 sox input.wav -c 1 output.wav bass -b 24
		 sox input.wav	    output.wav bass -b 24 channels 1
	      though the second	form is	more flexible as it allows the effects
	      to be ordered arbitrarily.

	      See  also	 remix	for  an	 effect	 that  allows  channels	 to be
	      mixed/selected arbitrarily.

       chorus gain-in gain-out <delay decay speed depth	-s|-t>
	      Add a chorus effect to the audio.	 This can make a single	 vocal
	      sound like a chorus, but can also	be applied to instrumentation.

	      Chorus  resembles	an echo	effect with a short delay, but whereas
	      with echo	the delay is constant, with chorus, it is varied using
	      sinusoidal  or  triangular modulation.  The modulation depth de-
	      fines the	range the modulated delay is played  before  or	 after
	      the  delay. Hence	the delayed sound will sound slower or faster,
	      that is the delayed sound	tuned around the original one, like in
	      a	 chorus	 where	some vocals are	slightly off key.  See [3] for
	      more discussion of the chorus effect.

	      Each four-tuple parameter	delay/decay/speed/depth	gives the  de-
	      lay  in  milliseconds and	the decay (relative to gain-in)	with a
	      modulation speed in Hz using depth in milliseconds.  The modula-
	      tion  is either sinusoidal (-s) or triangular (-t).  Gain-out is
	      the volume of the	output.

	      A	typical	delay is around	40ms to	60ms; the modulation speed  is
	      best near	0.25Hz and the modulation depth	around 2ms.  For exam-
	      ple, a single delay:
		 play guitar1.wav chorus 0.7 0.9 55 0.4	0.25 2 -t
	      Two delays of the	original samples:
		 play guitar1.wav chorus 0.6 0.9 50 0.4	0.25 2 -t \
		    60 0.32 0.4	1.3 -s
	      A	fuller sounding	chorus (with three additional delays):
		 play guitar1.wav chorus 0.5 0.9 50 0.4	0.25 2 -t \
		    60 0.32 0.4	2.3 -t 40 0.3 0.3 1.3 -s

       compand attack1,decay1{,attack2,decay2}
	      [gain [initial-volume-dB [delay]]]

	      Compand (compress	or expand) the dynamic range of	the audio.

	      The attack and decay parameters (in seconds) determine the  time
	      over  which the instantaneous level of the input signal is aver-
	      aged to determine	its volume; attacks refer to increases in vol-
	      ume and decays refer to decreases.  For most situations, the at-
	      tack time	(response to  the  music  getting  louder)  should  be
	      shorter than the decay time because the human ear	is more	sensi-
	      tive to sudden loud music	than sudden soft  music.   Where  more
	      than one pair of attack/decay parameters are specified, each in-
	      put channel is companded separately and the number of pairs must
	      agree  with  the	number	of input channels.  Typical values are
	      0.3,0.8 seconds.

	      The second parameter is a	list  of  points  on  the  compander's
	      transfer function	specified in dB	relative to the	maximum	possi-
	      ble signal amplitude.  The input values must be  in  a  strictly
	      increasing  order	 but the transfer function does	not have to be
	      monotonically rising.  If	omitted, the value of out-dB1 defaults
	      to  the  same  value as in-dB1; levels below in-dB1 are not com-
	      panded (but may have gain	applied	to them).  The	point  0,0  is
	      assumed  but  may	 be overridden (by 0,out-dBn).	If the list is
	      preceded by a soft-knee-dB value,	then the points	at where adja-
	      cent line	segments on the	transfer function meet will be rounded
	      by the amount given.  Typical values for the  transfer  function
	      are 6:-70,-60,-20.

	      The third	(optional) parameter is	an additional gain in dB to be
	      applied at all points on the transfer function and  allows  easy
	      adjustment of the	overall	gain.

	      The  fourth  (optional)  parameter is an initial level to	be as-
	      sumed for	each channel when companding starts.  This permits the
	      user  to supply a	nominal	level initially, so that, for example,
	      a	very large gain	is not applied to initial signal levels	before
	      the companding action has	begun to operate: it is	quite probable
	      that in such an event, the  output  would	 be  severely  clipped
	      while  the  compander  gain  properly adjusts itself.  A typical
	      value (for audio which is	initially quiet) is -90	dB.

	      The fifth	(optional) parameter is	a delay	in seconds.  The input
	      signal  is analysed immediately to control the compander,	but it
	      is delayed before	being fed to the volume	adjuster.   Specifying
	      a	delay approximately equal to the attack/decay times allows the
	      compander	to effectively operate in a `predictive' rather	than a
	      reactive mode.  A	typical	value is 0.2 seconds.

				    *	     *	      *

	      The  following  example  might  be used to make a	piece of music
	      with both	quiet and loud passages	suitable for listening to in a
	      noisy environment	such as	a moving vehicle:
		 sox asz.wav asz-car.wav compand 0.3,1 6:-70,-60,-20 -5	-90 0.2
	      The  transfer  function (`6:-70,...') says that very soft	sounds
	      (below -70dB) will remain	unchanged.  This will stop the compan-
	      der  from	 boosting  the volume on `silent' passages such	as be-
	      tween movements.	However, sounds	in  the	 range	-60dB  to  0dB
	      (maximum	volume)	will be	boosted	so that	the 60dB dynamic range
	      of the original music will be  compressed	 3-to-1	 into  a  20dB
	      range, which is wide enough to enjoy the music but narrow	enough
	      to get around the	road noise.  The `6:'  selects	6dB  soft-knee
	      companding.  The -5 (dB) output gain is needed to	avoid clipping
	      (the number is inexact, and  was	derived	 by  experimentation).
	      The  -90	(dB)  for the initial volume will work fine for	a clip
	      that starts with near silence, and the delay  of	0.2  (seconds)
	      has  the	effect	of  causing  the compander to react a bit more
	      quickly to sudden	volume changes.

	      In the next example, compand is being used as a  noise-gate  for
	      when the noise is	at a lower level than the signal:
		 play infile compand .1,.2 -inf,-50.1,-inf,-50,-50 0 -90 .1
	      Here is another noise-gate, this time for	when the noise is at a
	      higher level than	the signal (making it, in some	ways,  similar
	      to squelch):
		 play infile compand .1,.1 -45.1,-45,-inf,0,-inf 45 -90	.1
	      This  effect supports the	--plot global option (for the transfer

	      See also mcompand	for a multiple-band companding effect.

       contrast	[enhancement-amount(75)]
	      Comparable with compression, this	effect modifies	an audio  sig-
	      nal  to  make  it	sound louder.  enhancement-amount controls the
	      amount of	the enhancement	and is a number	in  the	 range	0-100.
	      Note  that enhancement-amount = 0	still gives a significant con-
	      trast enhancement.

	      See also the compand and mcompand	effects.

       dcshift shift [limitergain]
	      Apply a DC shift to the audio.  This can be useful to  remove  a
	      DC offset	(caused	perhaps	by a hardware problem in the recording
	      chain) from the audio.  The effect of a  DC  offset  is  reduced
	      headroom and hence volume.  The stat or stats effect can be used
	      to determine if a	signal has a DC	offset.

	      The given	dcshift	value is a floating point number in the	 range
	      of +-2 that indicates the	amount to shift	the audio (which is in
	      the range	of +-1).

	      An optional limitergain can be specified	as  well.   It	should
	      have  a  value  much less	than 1 (e.g. 0.05 or 0.02) and is used
	      only on peaks to prevent clipping.

				    *	     *	      *

	      An alternative approach to removing a DC offset (albeit  with  a
	      short delay) is to use the highpass filter effect	at a frequency
	      of say 10Hz, as illustrated in the following example:
		 sox -n	dc.wav synth 5 sin %0 50
		 sox dc.wav fixed.wav highpass 10

       deemph Apply Compact Disc (IEC 60908) de-emphasis (a treble attenuation
	      shelving filter).

	      Pre-emphasis  was	applied	in the mastering of some CDs issued in
	      the early	1980s.	These included many classical music albums, as
	      well  as	now sought-after issues	of albums by The Beatles, Pink
	      Floyd and	others.	 Pre-emphasis should be	 removed  at  playback
	      time  by	a de-emphasis filter in	the playback device.  However,
	      not all modern CD	players	have this filter, and very few	PC  CD
	      drives have it; playing pre-emphasised audio without the correct
	      de-emphasis filter results in audio that sounds harsh and	is far
	      from what	its creators intended.

	      With  the	 deemph	 effect, it is possible	to apply the necessary
	      de-emphasis to audio that	has been extracted from	 a  pre-empha-
	      sised  CD, and then either burn the de-emphasised	audio to a new
	      CD (which	will then play correctly on any	CD player), or	simply
	      play the correctly de-emphasised audio files on the PC.  For ex-
		 sox track1.wav	track1-deemph.wav deemph
	      and then burn track1-deemph.wav to CD, or
		 play track1-deemph.wav
	      or simply
		 play track1.wav deemph
	      The de-emphasis filter is	implemented as a biquad	 and  requires
	      the input	audio sample rate to be	either 44.1kHz or 48kHz.  Max-
	      imum deviation from the ideal response is	 only  0.06dB  (up  to

	      This effect supports the --plot global option.

	      See also the bass	and treble shelving equalisation effects.

       delay {position(=)}
	      Delay  one  or  more  audio channels such	that they start	at the
	      given position.  For example, delay  1.5	+1  3000s  delays  the
	      first  channel by	1.5 seconds, the second	channel	by 2.5 seconds
	      (one second more than the	previous channel), the	third  channel
	      by  3000	samples,  and  leaves  any  other channels that	may be
	      present un-delayed.  The following (one long)  command  plays  a
	      chime sound:
		 play -n synth -j 3 sin	%3 sin %-2 sin %-5 sin %-9 \
		   sin %-14 sin	%-21 fade h .01	2 1.5 delay \
		   1.3 1 .76 .54 .27 remix - fade h 0 2.7 2.5 norm -1
	      and this plays a guitar chord:
		 play -n synth pl G2 pl	B2 pl D3 pl G3 pl D4 pl	G4 \
		   delay 0 .05 .1 .15 .2 .25 remix - fade 0 4 .1 norm -1

       dither [-S|-s|-f	filter]	[-a] [-p precision]
	      Apply  dithering	to  the	 audio.	 Dithering deliberately	adds a
	      small amount of noise to the signal in  order  to	 mask  audible
	      quantization effects that	can occur if the output	sample size is
	      less than	24 bits.  With no options, this	effect will add	trian-
	      gular  (TPDF) white noise.  Noise-shaping	(only for certain sam-
	      ple rates) can be	selected with -s.  With	the -f option,	it  is
	      possible	to  select  a particular noise-shaping filter from the
	      following	list: lipshitz,	f-weighted,  modified-e-weighted,  im-
	      proved-e-weighted, gesemann, shibata, low-shibata, high-shibata.
	      Note that	most filter types are available	only with 44100Hz sam-
	      ple  rate.   The filter types are	distinguished by the following
	      properties: audibility of	noise, level  of  (inaudible,  but  in
	      some circumstances, otherwise problematic) shaped	high frequency
	      noise, and processing speed.
	      See  for  graphs  of
	      the different noise-shaping curves.

	      The  -S  option selects a	slightly `sloped' TPDF,	biased towards
	      higher frequencies.  It can be used at any sampling rate but be-
	      low  ~~22k,  plain  TPDF	is  probably better, and above ~~ 37k,
	      noise-shaping (if	available) is probably better.

	      The -a option enables a mode where dithering (and	 noise-shaping
	      if  applicable) are automatically	enabled	only when needed.  The
	      most likely use for this is when applying	fade in	or out	to  an
	      already  dithered	 file, so that the redithering applies only to
	      the faded	portions.  However, auto dithering is not  fool-proof,
	      so  the  fades should be carefully checked for any noise modula-
	      tion; if this occurs, then either	re-dither the whole  file,  or
	      use trim,	fade, and concatencate.

	      The -p option allows overriding the target precision.

	      If  the  SoX  global  option  -R	option	is not given, then the
	      pseudo-random number generator used to generate the white	 noise
	      will  be	`reseeded', i.e. the generated noise will be different
	      between invocations.

	      This effect should not be	followed by any	other effect that  af-
	      fects the	audio.

	      See also the `Dithering' section above.

       downsample [factor(2)]
	      Downsample  the  signal by an integer factor: Only the first out
	      of each factor samples is	retained, the others are discarded.

	      No decimation filter is applied.	If the input is	not a properly
	      bandlimited  baseband  signal, aliasing will occur.  This	may be
	      desirable, e.g., for frequency translation.

	      For a general resampling effect with  anti-aliasing,  see	 rate.
	      See also upsample.

       earwax Makes  audio  easier to listen to	on headphones.	Adds `cues' to
	      44.1kHz stereo (i.e. audio CD format) audio so  that  when  lis-
	      tened  to	 on  headphones	 the stereo image is moved from	inside
	      your head	(standard for headphones) to outside and in  front  of
	      the listener (standard for speakers).

       echo gain-in gain-out <delay decay>
	      Add  echoing  to	the audio.  Echoes are reflected sound and can
	      occur naturally amongst mountains	(and  sometimes	 large	build-
	      ings)  when  talking  or	shouting; digital echo effects emulate
	      this behaviour and are often used	to help	fill out the sound  of
	      a	 single	 instrument or vocal.  The time	difference between the
	      original signal and the reflection is the	 `delay'  (time),  and
	      the  loudness  of	the reflected signal is	the `decay'.  Multiple
	      echoes can have different	delays and decays.

	      Each given delay decay pair gives	the delay in milliseconds  and
	      the  decay  (relative to gain-in)	of that	echo.  Gain-out	is the
	      volume of	the output.  For example: This will make it  sound  as
	      if there are twice as many instruments as	are actually playing:
		 play lead.aiff	echo 0.8 0.88 60 0.4
	      If  the delay is very short, then	it sound like a	(metallic) ro-
	      bot playing music:
		 play lead.aiff	echo 0.8 0.88 6	0.4
	      A	longer delay will sound	like an	open air concert in the	 moun-
		 play lead.aiff	echo 0.8 0.9 1000 0.3
	      One mountain more, and:
		 play lead.aiff	echo 0.8 0.9 1000 0.3 1800 0.25

       echos gain-in gain-out <delay decay>
	      Add  a  sequence	of echoes to the audio.	 Each delay decay pair
	      gives the	delay in milliseconds and the decay (relative to gain-
	      in) of that echo.	 Gain-out is the volume	of the output.

	      Like  the	echo effect, echos stand for `ECHO in Sequel', that is
	      the first	echos takes the	input, the second the  input  and  the
	      first  echos,  the  third	the input and the first	and the	second
	      echos, ... and so	on.  Care should be taken using	many echos;  a
	      single echos has the same	effect as a single echo.

	      The sample will be bounced twice in symmetric echos:
		 play lead.aiff	echos 0.8 0.7 700 0.25 700 0.3
	      The sample will be bounced twice in asymmetric echos:
		 play lead.aiff	echos 0.8 0.7 700 0.25 900 0.3
	      The sample will sound as if played in a garage:
		 play lead.aiff	echos 0.8 0.7 40 0.25 63 0.3

       equalizer frequency[k] width[q|o|h|k] gain
	      Apply  a	two-pole  peaking equalisation (EQ) filter.  With this
	      filter, the signal-level at and around a selected	frequency  can
	      be increased or decreased, whilst	(unlike	band-pass and band-re-
	      ject filters) that at all	other frequencies is unchanged.

	      frequency	gives the filter's central frequency in	Hz, width, the
	      band-width,  and	gain  the  required gain or attenuation	in dB.
	      Beware of	Clipping when using a positive gain.

	      In order to produce complex equalisation curves, this effect can
	      be given several times, each with	a different central frequency.

	      The filter is described in detail	in [1].

	      This effect supports the --plot global option.

	      See also bass and	treble for shelving equalisation effects.

       fade [type] fade-in-length [stop-position(=) [fade-out-length]]
	      Apply a fade effect to the beginning, end, or both of the	audio.

	      An  optional  type  can  be specified to select the shape	of the
	      fade curve: q for	quarter	of a sine wave,	 h  for	 half  a  sine
	      wave,  t for linear (`triangular') slope,	l for logarithmic, and
	      p	for inverted parabola.	The default is logarithmic.

	      A	fade-in	starts from the	first  sample  and  ramps  the	signal
	      level  from  0  to  full	volume over the	time given as fade-in-
	      length.  Specify 0 if no fade-in is wanted.

	      For fade-outs, the audio will be truncated at stop-position  and
	      the  signal level	will be	ramped from full volume	down to	0 over
	      an interval of fade-out-length  before  the  stop-position.   If
	      fade-out-length  is not specified, it defaults to	the same value
	      as fade-in-length.  No fade-out is performed if stop-position is
	      not  specified.	If the audio length can	be determined from the
	      input file header	and any	previous effects,  then	 -0  (or,  for
	      historical reasons, 0) may be specified for stop-position	to in-
	      dicate the usual case of a fade-out that ends at the end of  the
	      input audio stream.

	      Any  time	specification may be used for fade-in-length and fade-

	      See also the splice effect.

       fir [coefs-file|coefs]
	      Use SoX's	FFT convolution	engine with given FIR  filter  coeffi-
	      cients.	If  a single argument is given then this is treated as
	      the name of a file containing the	 filter	 coefficients  (white-
	      space  separated;	may contain `#'	comments).  If the given file-
	      name is `-', or if no argument is	given, then  the  coefficients
	      are  read	 from the `standard input' (stdin); otherwise, coeffi-
	      cients may be given on the command line.	Examples:
		 sox infile outfile fir	0.0195 -0.082 0.234 0.891 -0.145 0.043
		 sox infile outfile fir	coefs.txt
	      with coefs.txt containing
		 # HP filter
		 # freq=10000

	      This effect supports the --plot global option.

       flanger [delay depth regen width	speed shape phase interp]
	      Apply a flanging effect to the audio.  See [3]  for  a  detailed
	      description of flanging.

	      All parameters are optional (right to left).

			Range	  Default   Description
	      delay	0 - 30	     0	    Base delay in milliseconds.
	      depth	0 - 10	     2	    Added swept	delay in milliseconds.
	      regen    -95 - 95	     0	    Percentage regeneration (delayed
					    signal feedback).
	      width    0 - 100	    71	    Percentage of delayed signal mixed
					    with original.
	      speed    0.1 - 10	    0.5	    Sweeps per second (Hz).
	      shape		    sin	    Swept wave shape: sine|triangle.
	      phase    0 - 100	    25	    Swept wave percentage phase-shift
					    for	multi-channel (e.g. stereo)
					    flange; 0 =	100 = same phase on
					    each channel.
	      interp		    lin	    Digital delay-line interpolation:

       gain [-e|-B|-b|-r] [-n] [-l|-h] [gain-dB]
	      Apply  amplification  or attenuation to the audio	signal,	or, in
	      some cases, to some of its channels.  Note that use  of  any  of
	      -e, -B, -b, -r, or -n requires temporary file space to store the
	      audio to be  processed,  so  may	be  unsuitable	for  use  with
	      `streamed' audio.

	      Without  other  options,	gain-dB	 is  used to adjust the	signal
	      power level by the given number of dB: positive  amplifies  (be-
	      ware of Clipping), negative attenuates.  With other options, the
	      gain-dB amplification or attenuation is (logically) applied  af-
	      ter the processing due to	those options.

	      Given  the  -e  option,  the  levels  of the audio channels of a
	      multi-channel file are `equalised', i.e.	gain is	applied	to all
	      channels	other than that	with the highest peak level, such that
	      all channels attain the same peak	level (but, without also  giv-
	      ing -n, the audio	is not `normalised').

	      The  -B  (balance) option	is similar to -e, but with -B, the RMS
	      level is used instead of the peak	level.	-B might  be  used  to
	      correct stereo imbalance caused by an imperfect record turntable
	      cartridge.   Note	that unlike -e,	-B might cause some clipping.

	      -b is similar to -B but has clipping protection, i.e.  if	neces-
	      sary  to	prevent	 clipping whilst balancing, attenuation	is ap-
	      plied to all channels.  Note, however, that in conjunction  with
	      -n, -B and -b are	synonymous.

	      The  -r option is	used in	conjunction with a prior invocation of
	      gain with	the -h option -	see below for details.

	      The -n option normalises the audio to 0dB	FSD; it	is often  used
	      in  conjunction  with  a negative	gain-dB	to the effect that the
	      audio is normalised to a given level below 0dB.  For example,
		 sox infile outfile gain -n
	      normalises to 0dB, and
		 sox infile outfile gain -n -3
	      normalises to -3dB.

	      The -l option invokes a simple limiter, e.g.
		 sox infile outfile gain -l 6
	      will apply 6dB of	gain but never clip.  Note that	limiting  more
	      than  a  few dBs more than occasionally (in a piece of audio) is
	      not recommended as it can	cause  audible	distortion.   See  the
	      compand effect for a more	capable	limiter.

	      The  -h  option  is  used	to apply gain to provide head-room for
	      subsequent processing.  For example, with
		 sox infile outfile gain -h bass +6
	      6dB of attenuation will be applied prior to  the	bass  boosting
	      effect  thus  ensuring  that  it will not	clip.  Of course, with
	      bass, it is obvious how much headroom will be needed,  but  with
	      other  effects  (e.g.   rate, dither) it is not always as	clear.
	      Another advantage	of using gain -h rather	than an	 explicit  at-
	      tenuation, is that if the	headroom is not	used by	subsequent ef-
	      fects, it	can be reclaimed with gain -r, for example:
		 sox infile outfile gain -h bass +6 rate 44100 gain -r
	      The above	effects	chain guarantees never to clip nor amplify; it
	      attenuates if necessary to prevent clipping, but by only as much
	      as is needed to do so.

	      Output formatting	(dithering and bit-depth reduction)  also  re-
	      quires headroom (which cannot be `reclaimed'), e.g.
		 sox infile outfile gain -h bass +6 rate 44100 gain -rh	dither
	      Here,  the second	gain invocation, reclaims as much of the head-
	      room as it can from the preceding	effects, but retains  as  much
	      headroom as is needed for	subsequent processing.	The SoX	global
	      option -G	can be given to	automatically invoke gain -h and  gain

	      See also the norm	and vol	effects.

       highpass|lowpass	[-1|-2]	frequency[k] [width[q|o|h|k]]
	      Apply  a	high-pass or low-pass filter with 3dB point frequency.
	      The filter can be	either single-pole (with -1),  or  double-pole
	      (the  default,  or  with -2).  width applies only	to double-pole
	      filters; the default is Q	= 0.707	and gives  a  Butterworth  re-
	      sponse.	The  filters roll off at 6dB per pole per octave (20dB
	      per pole per decade).  The double-pole filters are described  in
	      detail in	[1].

	      These effects support the	--plot global option.

	      See also sinc for	filters	with a steeper roll-off.

       hilbert [-n taps]
	      Apply  an	 odd-tap  Hilbert transform filter, phase-shifting the
	      signal by	90 degrees.

	      This is used in many matrix coding schemes and for analytic sig-
	      nal  generation.	 The process is	often written as a multiplica-
	      tion by i	(or j),	the imaginary unit.

	      An odd-tap Hilbert transform filter has a	bandpass  characteris-
	      tic,  attenuating	the lowest and highest frequencies.  Its band-
	      width can	be controlled by the number of filter taps, which  can
	      be  specified with -n.  By default, the number of	taps is	chosen
	      for a cutoff frequency of	about 75 Hz.

	      This effect supports the --plot global option.

       ladspa [-l|-r] module [plugin] [argument	...]
	      Apply a LADSPA [5] (Linux	Audio Developer's Simple  Plugin  API)
	      plugin.	Despite	 the name, LADSPA is not Linux-specific, and a
	      wide range of effects is available as LADSPA  plugins,  such  as
	      cmt  [6]	(the Computer Music Toolkit) and Steve Harris's	plugin
	      collection [7]. The first	argument is  the  plugin  module,  the
	      second  the  name	 of the	plugin (a module can contain more than
	      one plugin), and any other arguments are for the	control	 ports
	      of  the plugin. Missing arguments	are supplied by	default	values
	      if possible.

	      Normally,	the number of input ports of the plugin	must match the
	      number  of input channels, and the number	of output ports	deter-
	      mines the	output channel count.  However,	the -r (replicate) op-
	      tion allows cloning a mono plugin	to handle multi-channel	input.

	      Some  plugins introduce latency which SoX	may optionally compen-
	      sate for.	 The -l	(latency  compensation)	 option	 automatically
	      compensates  for latency as reported by the plugin via an	output
	      control port named "latency".

	      If found,	the environment	variable LADSPA_PATH will be  used  as
	      search path for plugins.

       loudness	[gain [reference]]
	      Loudness	control	 -  similar  to	 the gain effect, but provides
	      equalisation   for   the	  human	   auditory    system.	   See for	a detailed description
	      of loudness.  The	gain is	adjusted by the	given  gain  parameter
	      (usually negative) and the signal	equalised according to ISO 226
	      w.r.t. a reference level of 65dB,	though an  alternative	refer-
	      ence level may be	given if the original audio has	been equalised
	      for some other optimal level.  A default gain of -10dB  is  used
	      if a gain	value is not given.

	      See also the gain	effect.

       lowpass [-1|-2] frequency[k] [width[q|o|h|k]]
	      Apply  a	low-pass  filter.  See the description of the highpass
	      effect for details.

       mcompand	"attack1,decay1{,attack2,decay2}
	      [gain  [initial-volume-dB	 [delay]]]"  {crossover-freq[k]	  "at-

	      The multi-band compander is similar to the single-band compander
	      but the audio is first divided into bands	 using	Linkwitz-Riley
	      cross-over filters and a separately specifiable compander	run on
	      each band.  See the compand effect for the definition of its pa-
	      rameters.	  Compand  parameters  are  specified  between	double
	      quotes and the crossover frequency for that  band	 is  given  by
	      crossover-freq; these can	be repeated to create multiple bands.

	      For  example,  the following (one	long) command shows how	multi-
	      band companding is typically used	in FM radio:
		 play track1.wav gain -3 sinc 8000- 29 100 mcompand \
		   "0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
		   "0.003,0.05 -47,-40,-34,-34,-17,-33"	400 \
		   "0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
		   "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
		   "0,0.025 -38,-31,-28,-28,-0,-25" \
		   gain	15 highpass 22 highpass	22 sinc	-n 255 -b 16 -17500 \
		   gain	9 lowpass -1 17801
	      The audio	file is	played with a simulated	 FM  radio  sound  (or
	      broadcast	 signal	 condition if the lowpass filter at the	end is
	      skipped).	 Note that the pipeline	is set up with	US-style  75us

	      See also compand for a single-band companding effect.

       noiseprof [profile-file]
	      Calculate	 a  profile  of	 the audio for use in noise reduction.
	      See the description of the noisered effect for details.

       noisered	[profile-file [amount]]
	      Reduce noise in the audio	signal	by  profiling  and  filtering.
	      This effect is moderately	effective at removing consistent back-
	      ground noise such	as hiss	or hum.	 To use	it, first run SoX with
	      the  noiseprof  effect  on a section of audio that ideally would
	      contain silence but in fact contains noise - such	 sections  are
	      typically	 found	at  the	 beginning  or the end of a recording.
	      noiseprof	will write out a noise profile to profile-file,	or  to
	      stdout if	no profile-file	or if `-' is given.  E.g.
		 sox speech.wav	-n trim	0 1.5 noiseprof	speech.noise-profile
	      To  actually remove the noise, run SoX again, this time with the
	      noisered effect; noisered	will reduce noise according to a noise
	      profile  (which  was generated by	noiseprof), from profile-file,
	      or from stdin if no profile-file or if `-' is given.  E.g.
		 sox speech.wav	cleaned.wav noisered speech.noise-profile 0.3
	      How much noise should be removed is specified by amount-a	number
	      between  0 and 1 with a default of 0.5.  Higher numbers will re-
	      move more	noise but present a  greater  likelihood  of  removing
	      wanted  components  of  the  audio  signal.  Before replacing an
	      original recording with a	noise-reduced version, experiment with
	      different	 amount	values to find the optimal one for your	audio;
	      use headphones to	check that you are  happy  with	 the  results,
	      paying particular	attention to quieter sections of the audio.

	      On  most systems,	the two	stages - profiling and reduction - can
	      be combined using	a pipe,	e.g.
		 sox noisy.wav -n trim 0 1 noiseprof | play noisy.wav noisered

       norm [dB-level]
	      Normalise	the audio.  norm is just an alias for gain -n; see the
	      gain effect for details.

       oops   Out  Of  Phase  Stereo  effect.  Mixes stereo to twin-mono where
	      each mono	channel	contains the difference	between	the  left  and
	      right stereo channels.  This is sometimes	known as the `karaoke'
	      effect as	it often has the effect	of removing most or all	of the
	      vocals from a recording.	It is equivalent to remix 1,2i 1,2i.

       overdrive [gain(20) [colour(20)]]
	      Non linear distortion.  The colour parameter controls the	amount
	      of even harmonic content in the over-driven output.

       pad { length[@position(=)] }
	      Pad the audio with silence, at the beginning, the	 end,  or  any
	      specified	points through the audio.  length is the amount	of si-
	      lence to insert and position the position	 in  the  input	 audio
	      stream  at  which	to insert it.  Any number of lengths and posi-
	      tions may	be specified, provided that a  specified  position  is
	      not  less	 that the previous one,	and any	time specification may
	      be used for them.	 position is optional for the first  and  last
	      lengths specified	and if omitted correspond to the beginning and
	      the end of the audio respectively.  For  example,	 pad  1.5  1.5
	      adds  1.5	 seconds  of silence padding at	each end of the	audio,
	      whilst pad 4000s@3:00 inserts 4000 samples of silence 3  minutes
	      into the audio.  If silence is wanted only at the	end of the au-
	      dio, specify either the end position or  specify	a  zero-length
	      pad at the start.

	      See  also	delay for an effect that can add silence at the	begin-
	      ning of the audio	on a channel-by-channel	basis.

       phaser gain-in gain-out delay decay speed [-s|-t]
	      Add a phasing effect to the audio.  See [3] for a	 detailed  de-
	      scription	of phasing.

	      delay/decay/speed	 gives the delay in milliseconds and the decay
	      (relative	to gain-in) with a modulation speed in Hz.  The	 modu-
	      lation  is either	sinusoidal (-s)	 - preferable for multiple in-
	      struments, or triangular (-t)   -	 gives	single	instruments  a
	      sharper  phasing	effect.	  The decay should be less than	0.5 to
	      avoid feedback, and usually no less than 0.1.  Gain-out  is  the
	      volume of	the output.

	      For example:
		 play snare.flac phaser	0.8 0.74 3 0.4 0.5 -t
		 play snare.flac phaser	0.9 0.85 4 0.23	1.3 -s
	      A	popular	sound:
		 play snare.flac phaser	0.89 0.85 1 0.24 2 -t
	      More severe:
		 play snare.flac phaser	0.6 0.66 3 0.6 2 -t

       pitch [-q] shift	[segment [search [overlap]]]
	      Change the audio pitch (but not tempo).

	      shift  gives  the	 pitch	shift  as positive or negative `cents'
	      (i.e. 100ths of a	semitone).  See	the tempo  effect  for	a  de-
	      scription	of the other parameters.

	      See also the bend, speed,	and tempo effects.

       rate [-q|-l|-m|-h|-v] [override-options]	RATE[k]
	      Change  the audio	sampling rate (i.e. resample the audio)	to any
	      given RATE (even non-integer if this is supported	by the	output
	      file format) using a quality level defined as follows:

			   Quality   Band-   Rej dB   Typical Use
		     -q	    quick     n/a    ~=30 @   playback on an-
					      Fs/4    cient hardware
		     -l	     low      80%     100     playback on old
		     -m	   medium     95%     100     audio playback
		     -h	    high      95%     125     16-bit mastering
						      (use with	dither)
		     -v	  very high   95%     175     24-bit mastering

	      where Band-width is the percentage of the	audio  frequency  band
	      that  is	preserved  and Rej dB is the level of noise rejection.
	      Increasing levels	of resampling quality come at the  expense  of
	      increasing  amounts of time to process the audio.	 If no quality
	      option is	given, the quality  level  used	 is  `high'  (but  see
	      `Playing & Recording Audio' above	regarding playback).

	      The  `quick'  algorithm uses cubic interpolation;	all others use
	      band-limited interpolation.  By default, all algorithms  have  a
	      `linear'	phase  response; for `medium', `high' and `very	high',
	      the phase	response is configurable (see below).

	      The rate effect is invoked  automatically	 if  SoX's  -r	option
	      specifies	a rate that is different to that of the	input file(s).
	      Alternatively, if	this effect is given explicitly, then SoX's -r
	      option  need  not	be given.  For example,	the following two com-
	      mands are	equivalent:
		 sox input.wav -r 48k output.wav bass -b 24
		 sox input.wav	      output.wav bass -b 24 rate 48k
	      though the second	command	is more	flexible as it allows rate op-
	      tions  to	 be  given, and	allows the effects to be ordered arbi-

				    *	     *	      *

	      Warning: technically detailed discussion follows.

	      The simple quality selection described above  provides  settings
	      that satisfy the needs of	the vast majority of resampling	tasks.
	      Occasionally, however, it	may be desirable to fine-tune the  re-
	      sampler's	 filter	 response;  this  can  be achieved using over-
	      ride options, as detailed	in the following table:

	      -M/-I/-L	   Phase response = minimum/intermediate/linear
	      -s	   Steep filter	(band-width = 99%)
	      -a	   Allow aliasing/imaging above	the pass-band
	      -b 74-99.7   Any band-width %
	      -p 0-100	   Any phase response (0 = minimum, 25 = intermediate,
			   50 =	linear,	100 = maximum)

	      N.B.   Override options cannot be	used with the `quick' or `low'
	      quality algorithms.

	      All resamplers use filters  that	can  sometimes	create	`echo'
	      (a.k.a.	`ringing')  artefacts  with  transient signals such as
	      those that occur with `finger snaps' or other highly  percussive
	      sounds.	Such  artefacts	 are much more noticeable to the human
	      ear if they occur	before the transient (`pre-echo') than if they
	      occur  after  it (`post-echo').  Note that frequency of any such
	      artefacts	is related to the smaller of the original and new sam-
	      pling rates but that if this is at least 44.1kHz,	then the arte-
	      facts will lie outside the range of human	hearing.

	      A	phase response setting may be used to control the distribution
	      of  any  transient  echo	between	`pre' and `post': with minimum
	      phase, there is no pre-echo but the longest post-echo; with lin-
	      ear  phase,  pre	and  post echo are in equal amounts (in	signal
	      terms, but not audibility	terms);	the intermediate phase setting
	      attempts to find the best	compromise by selecting	a small	length
	      (and level) of pre-echo and a medium lengthed post-echo.

	      Minimum, intermediate, or	linear phase response is selected  us-
	      ing  the	-M,  -I,  or -L	option;	a custom phase response	can be
	      created with the -p option.  Note	that phase  responses  between
	      `linear' and `maximum' (greater than 50) are rarely useful.

	      A	resampler's band-width setting determines how much of the fre-
	      quency content of	the original signal (w.r.t. the	original  sam-
	      ple rate when up-sampling, or the	new sample rate	when down-sam-
	      pling) is	preserved during conversion.  The term `pass-band'  is
	      used  to	refer  to  all	frequencies up to the band-width point
	      (e.g. for	44.1kHz	sampling rate, and a resampling	band-width  of
	      95%,  the	 pass-band  represents	frequencies from 0Hz (D.C.) to
	      circa 21kHz).  Increasing	the resampler's	band-width results  in
	      a	 slower	 conversion  and can increase transient	echo artefacts
	      (and vice	versa).

	      The -s `steep filter' option changes resampling band-width  from
	      the default 95% (based on	the 3dB	point),	to 99%.	 The -b	option
	      allows the band-width to be  set	to  any	 value	in  the	 range
	      74-99.7  %, but note that	band-width values greater than 99% are
	      not recommended for normal use as	they can cause excessive tran-
	      sient echo.

	      If the -a	option is given, then aliasing/imaging above the pass-
	      band is allowed.	For example, with 44.1kHz sampling rate, and a
	      resampling  band-width of	95%, this means	that frequency content
	      above 21kHz can be distorted; however, since this	is  above  the
	      pass-band	 (i.e.	 above the highest frequency of	interest/audi-
	      bility), this may	not be a problem.  The	benefits  of  allowing
	      aliasing/imaging	are  reduced  processing time, and reduced (by
	      almost half) transient echo artefacts.  Note that	if this	option
	      is  given,  then	the  minimum  band-width allowable with	-b in-
	      creases to 85%.

		 sox input.wav -b 16 output.wav	rate -s	-a 44100 dither	-s
	      default (high) quality resampling; overrides: steep filter,  al-
	      low  aliasing;  to  44.1kHz  sample rate;	noise-shaped dither to
	      16-bit WAV file.
		 sox input.wav -b 24 output.aiff rate -v -I -b 90 48k
	      very high	quality	 resampling;  overrides:  intermediate	phase,
	      band-width  90%; to 48k sample rate; store output	to 24-bit AIFF

				    *	     *	      *

	      The pitch	and speed effects use the rate effect at their core.

       remix [-a|-m|-p]	<out-spec>
	      out-spec	= in-spec{,in-spec} | 0
	      in-spec	= [in-chan][-[in-chan2]][vol-spec]
	      vol-spec	= p|i|v[volume]

	      Select and mix input audio channels into output audio  channels.
	      Each  output channel is specified, in turn, by a given out-spec:
	      a	list of	contributing input channels and	volume specifications.

	      Note that	this effect operates on	the audio channels within  the
	      SoX effects processing chain; it should not be confused with the
	      -m global	option (where multiple files are  mix-combined	before
	      entering the effects chain).

	      An  out-spec  contains comma-separated input channel-numbers and
	      hyphen-delimited channel-number ranges; alternatively, 0 may  be
	      given to create a	silent output channel.	For example,
		 sox input.wav output.wav remix	6 7 8 0
	      creates  an output file with four	channels, where	channels 1, 2,
	      and 3 are	copies of channels 6, 7, and 8 in the input file,  and
	      channel 4	is silent.  Whereas
		 sox input.wav output.wav remix	1-3,7 3
	      creates  a  (somewhat bizarre) stereo output file	where the left
	      channel is a mix-down of input channels 1, 2, 3, and 7, and  the
	      right channel is a copy of input channel 3.

	      Where  a	range of channels is specified,	the channel numbers to
	      the left and right of the	hyphen are optional and	default	 to  1
	      and to the number	of input channels respectively.	Thus
		 sox input.wav output.wav remix	-
	      performs a mix-down of all input channels	to mono.

	      By  default,  where an output channel is mixed from multiple (n)
	      input channels, each input channel will be scaled	by a factor of
	      ^1/n.  Custom mixing volumes can be set by following a given in-
	      put channel or range of input channels with a  vol-spec  (volume
	      specification).  This is one of the letters p, i,	or v, followed
	      by a volume number, the meaning of which depends	on  the	 given
	      letter and is defined as follows:

		     Letter   Volume number	   Notes
		       p      power adjust in dB   0 = no change
		       i      power adjust in dB   As `p', but invert
						   the audio
		       v      voltage multiplier   1 = no change, 0.5
						   ~= 6dB attenuation,
						   2 ~=	6dB gain, -1 =

	      If  an out-spec includes at least	one vol-spec then, by default,
	      ^1/n scaling is not applied to any other channels	 in  the  same
	      out-spec (though may be in other out-specs).  The	-a (automatic)
	      option however, can be given to retain the automatic scaling  in
	      this case.  For example,
		 sox input.wav output.wav remix	1,2 3,4v0.8
	      results in channel level multipliers of 0.5,0.5 1,0.8, whereas
		 sox input.wav output.wav remix	-a 1,2 3,4v0.8
	      results in channel level multipliers of 0.5,0.5 0.5,0.8.

	      The  -m  (manual)	 option	 disables all automatic	volume adjust-
	      ments, so
		 sox input.wav output.wav remix	-m 1,2 3,4v0.8
	      results in channel level multipliers of 1,1 1,0.8.

	      The volume number	is optional and	omitting it corresponds	to  no
	      volume change; however, the only case in which this is useful is
	      in conjunction with i.  For example,  if	input.wav  is  stereo,
		 sox input.wav output.wav remix	1,2i
	      is a mono	equivalent of the oops effect.

	      If  the  -p  option is given, then any automatic ^1/n scaling is
	      replaced by ^1/<sqrt>n (`power') scaling;	this  gives  a	louder
	      mix but one that might occasionally clip.

				    *	     *	      *

	      One use of the remix effect is to	split an audio file into a set
	      of files,	each containing	one of the  constituent	 channels  (in
	      order to perform subsequent processing on	individual audio chan-
	      nels).  Where more than a	few channels are  involved,  a	script
	      such as the following (Bourne shell script) is useful:
	      chans=`soxi -c "$1"`
	      while [ $chans -ge 1 ]; do
		 chans0=`printf	%02i $chans`   # 2 digits hence	up to 99 chans
		 out=`echo "$1"|sed "s/\(.*\)\.\(.*\)/\1-$chans0.\2/"`
		 sox "$1" "$out" remix $chans
		 chans=`expr $chans - 1`
	      If  a  file  input.wav containing	six audio channels were	given,
	      the script would produce six  output  files:  input-01.wav,  in-
	      put-02.wav, ..., input-06.wav.

	      See also the swap	effect.

       repeat [count(1)|-]
	      Repeat  the  entire  audio  count	times, or once if count	is not
	      given.  The special value	- requests infinite  repetition.   Re-
	      quires  temporary	 file space to store the audio to be repeated.
	      Note that	repeating once yields two copies: the  original	 audio
	      and the repeated audio.

       reverb [-w|--wet-only] [reverberance (50%) [HF-damping (50%)
	      [room-scale (100%) [stereo-depth (100%)
	      [pre-delay (0ms) [wet-gain (0dB)]]]]]]

	      Add  reverberation  to the audio using the `freeverb' algorithm.
	      A	reverberation effect is	sometimes desirable for	concert	 halls
	      that  are	 too  small  or	contain	so many	people that the	hall's
	      natural reverberance is diminished.  Applying a small amount  of
	      stereo  reverb to	a (dry)	mono signal will usually make it sound
	      more natural.  See [3] for a detailed description	of  reverbera-

	      Note  that  this effect increases	both the volume	and the	length
	      of the audio, so to prevent clipping in these domains, a typical
	      invocation might be:
		 play dry.wav gain -3 pad 0 3 reverb
	      The -w option can	be given to select only	the `wet' signal, thus
	      allowing it to be	processed further, independently of the	 `dry'
	      signal.  E.g.
		 play -m voice.wav "|sox voice.wav -p reverse reverb -w	reverse"
	      for a reverse reverb effect.

	      Reverse  the audio completely.  Requires temporary file space to
	      store the	audio to be reversed.

       riaa   Apply RIAA vinyl playback	equalisation.  The sampling rate  must
	      be one of: 44.1, 48, 88.2, 96 kHz.

	      This effect supports the --plot global option.

       silence [-l] above-periods [duration threshold[d|%]
	      [below-periods duration threshold[d|%]]

	      Removes silence from the beginning, middle, or end of the	audio.
	      `Silence'	is determined by a specified threshold.

	      The above-periods	value is used to indicate if audio  should  be
	      trimmed at the beginning of the audio. A value of	zero indicates
	      no silence should	be trimmed from	the beginning. When specifying
	      a	 non-zero above-periods, it trims audio	up until it finds non-
	      silence. Normally, when trimming silence from beginning of audio
	      the  above-periods  will	be 1 but it can	be increased to	higher
	      values to	trim all audio up to a specific	count  of  non-silence
	      periods.	For  example,  if you had an audio file	with two songs
	      that each	contained 2 seconds of silence before  the  song,  you
	      could specify an above-period of 2 to strip out both silence pe-
	      riods and	the first song.

	      When above-periods is non-zero, you must also specify a duration
	      and  threshold.  duration	indicates the amount of	time that non-
	      silence must be detected before it stops trimming	audio. By  in-
	      creasing	the duration, burst of noise can be treated as silence
	      and trimmed off.

	      threshold	is used	to indicate what sample	value you should treat
	      as silence.  For digital audio, a	value of 0 may be fine but for
	      audio recorded from analog, you may wish to increase  the	 value
	      to account for background	noise.

	      When  optionally trimming	silence	from the end of	the audio, you
	      specify a	below-periods count.  In this case, below-period means
	      to  remove  all audio after silence is detected.	Normally, this
	      will be a	value 1	of but it can be increased to skip over	 peri-
	      ods of silence that are wanted.  For example, if you have	a song
	      with 2 seconds of	silence	in the middle and 2 second at the end,
	      you  could set below-period to a value of	2 to skip over the si-
	      lence in the middle of the audio.

	      For below-periods, duration specifies a period of	 silence  that
	      must exist before	audio is not copied any	more.  By specifying a
	      higher duration, silence that is wanted can be left in  the  au-
	      dio.   For example, if you have a	song with an expected 1	second
	      of silence in the	middle and 2 seconds of	silence	at the end,  a
	      duration	of 2 seconds could be used to skip over	the middle si-

	      Unfortunately, you must know the length of the  silence  at  the
	      end  of  your  audio file	to trim	off silence reliably.  A work-
	      around is	to use the silence effect in combination with the  re-
	      verse  effect.   By  first  reversing the	audio, you can use the
	      above-periods to reliably	trim all audio from  what  looks  like
	      the  front of the	file.  Then reverse the	file again to get back
	      to normal.

	      To remove	silence	from the middle	of a file, specify a below-pe-
	      riods  that  is negative.	 This value is then treated as a posi-
	      tive value and is	also used to indicate that the	effect	should
	      restart  processing as specified by the above-periods, making it
	      suitable for removing periods of silence in the  middle  of  the

	      The  option  -l  indicates that below-periods duration length of
	      audio should be left intact at the beginning of each  period  of
	      silence.	For example, if	you want to remove long	pauses between
	      words but	do not want to remove the pauses completely.

	      duration is a time specification with  the  peculiarity  that  a
	      bare number is interpreted as a sample count, not	as a number of
	      seconds.	For specifying seconds,	either use the t suffix	(as in
	      `2t') or specify minutes,	too (as	in `0:02').

	      threshold	 numbers  may be suffixed with d to indicate the value
	      is in decibels, or % to indicate a percentage of	maximum	 value
	      of the sample value (0% specifies	pure digital silence).

	      The following example shows how this effect can be used to start
	      a	recording that does not	contain	the delay at the  start	 which
	      usually  occurs  between	`pressing  the	record button' and the
	      start of the performance:
		 rec parameters	filename other-effects silence 1 5 2%

       sinc [-a	att|-b beta] [-p phase|-M|-I|-L] [-t tbw|-n taps] [freqHP]
       [-freqLP	[-t tbw|-n taps]]
	      Apply  a sinc kaiser-windowed low-pass, high-pass, band-pass, or
	      band-reject filter to the	signal.	 The freqHP and	freqLP parame-
	      ters  give  the frequencies of the 6dB points of a high-pass and
	      low-pass filter that may be invoked individually,	 or  together.
	      If  both are given, then freqHP less than	freqLP creates a band-
	      pass filter, freqHP greater than freqLP  creates	a  band-reject
	      filter.  For example, the	invocations
		 sinc 3k
		 sinc -4k
		 sinc 3k-4k
		 sinc 4k-3k
	      create  a	high-pass, low-pass, band-pass,	and band-reject	filter

	      The default stop-band attenuation	of  120dB  can	be  overridden
	      with  -a;	 alternatively,	the kaiser-window `beta' parameter can
	      be given directly	with -b.

	      The default transition band-width	of 5% of the total band	can be
	      overridden with -t (and tbw in Hertz); alternatively, the	number
	      of filter	taps can be given directly with	-n.

	      If both freqHP and freqLP	are given, then	 a  -t	or  -n	option
	      given  to	 the  left of the frequencies applies to both frequen-
	      cies; one	of these options given to the right of the frequencies
	      applies only to freqLP.

	      The  -p,	-M,  -I, and -L	options	control	the filter's phase re-
	      sponse; see the rate effect for details.

	      This effect supports the --plot global option.

       spectrogram [options]
	      Create a spectrogram of the audio; the audio is  passed  unmodi-
	      fied  through the	SoX processing chain.  This effect is optional
	      -	type sox --help	and check the list of supported	effects	to see
	      if it has	been included.

	      The  spectrogram is rendered in a	Portable Network Graphic (PNG)
	      file, and	shows time in the X-axis, frequency in the Y-axis, and
	      audio  signal magnitude in the Z-axis.  Z-axis values are	repre-
	      sented by	the colour (or optionally the intensity) of the	pixels
	      in  the  X-Y plane.  If the audio	signal contains	multiple chan-
	      nels then	these are shown	from top to bottom starting from chan-
	      nel 1 (which is the left channel for stereo audio).

	      For example, if `my.wav' is a stereo file, then with
		 sox my.wav -n spectrogram
	      a	 spectrogram  of  the  entire file will	be created in the file
	      `spectrogram.png'.  More often though,  analysis	of  a  smaller
	      portion of the audio is required;	e.g. with
		 sox my.wav -n remix 2 trim 20 30 spectrogram
	      the  spectrogram	shows information only from the	second (right)
	      channel, and of thirty seconds of	 audio	starting  from	twenty
	      seconds in.  To analyse a	small portion of the frequency domain,
	      the rate effect may be used, e.g.
		 sox my.wav -n rate 6k spectrogram
	      allows detailed analysis of frequencies up  to  3kHz  (half  the
	      sampling rate) i.e. where	the human auditory system is most sen-
	      sitive.  With
		 sox my.wav -n trim 0 10 spectrogram -x	600 -y 200 -z 100
	      the given	options	control	the size of the	spectrogram's X, Y & Z
	      axes  (in	 this case, the	spectrogram area of the	produced image
	      will be 600 by 200 pixels	in size	and the	Z-axis range  will  be
	      100  dB).	  Note	that  the produced image includes axes legends
	      etc. and so will be a little larger than the specified  spectro-
	      gram size.  In this example:
		 sox -n	-n synth 6 tri 10k:14k spectrogram -z 100 -w kaiser
	      an analysis `window' with	high dynamic range is selected to best
	      display the spectrogram of a swept triangular wave.  For a  smi-
	      lar  example, append the following to the	`chime'	command	in the
	      description of the delay effect (above):
		 rate 2k spectrogram -X	200 -Z -10 -w kaiser
	      Options are also available to control  the  appearance  (colour-
	      set,  brightness,	 contrast,  etc.) and filename of the spectro-
	      gram; e.g. with
		 sox my.wav -n spectrogram -m -l -o print.png
	      a	spectrogram is created suitable	for printing on	a  `black  and
	      white' printer.


	      -x num Change  the  (maximum)  width (X-axis) of the spectrogram
		     from its default value of 800 pixels to  a	 given	number
		     between 100 and 200000.  See also -X and -d.

	      -X num X-axis  pixels/second;  the default is auto-calculated to
		     fit the given or known audio duration to the X-axis size,
		     or	 100 otherwise.	 If given in conjunction with -d, this
		     option affects the	width of the  spectrogram;  otherwise,
		     it	 affects  the duration of the spectrogram.  num	can be
		     from 1 (low time resolution) to 5000 (high	 time  resolu-
		     tion)  and	need not be an integer.	 SoX may make a	slight
		     adjustment	to the given number for	 processing  quantisa-
		     tion  reasons;  if	 so, SoX will report the actual	number
		     used (viewable when the SoX global	option -V  is  in  ef-
		     fect).  See also -x and -d.

	      -y num Sets the Y-axis size in pixels (per channel); this	is the
		     number of frequency `bins'	used in	the  Fourier  analysis
		     that  produces  the  spectrogram.	N.B. it	can be slow to
		     produce the spectrogram if	this number is	not  one  more
		     than  a  power  of	two (e.g. 129).	 By default the	Y-axis
		     size is chosen automatically (depending on	the number  of
		     channels).	  See  -Y for alternative way of setting spec-
		     trogram height.

	      -Y num Sets the target total height of the spectrogram(s).   The
		     default  value  is	550 pixels.  Using this	option (and by
		     default), SoX will	choose a height	for  individual	 spec-
		     trogram channels that is one more than a power of two, so
		     the actual	total height may fall short of the given  num-
		     ber.  However, there is also a minimum height per channel
		     so	if there are many channels,  the  number  may  be  ex-
		     ceeded.   See  -y for alternative way of setting spectro-
		     gram height.

	      -z num Z-axis (colour) range in dB, default 120.	This sets  the
		     dynamic-range  of	the  spectrogram  to  be  -num dBFS to
		     0 dBFS.  Num may range from 20 to	180.   Decreasing  dy-
		     namic-range  effectively  increases the `contrast'	of the
		     spectrogram display, and vice versa.

	      -Z num Sets the upper limit of the Z-axis	in dBFS.   A  negative
		     num  effectively  increases the `brightness' of the spec-
		     trogram display, and vice versa.

	      -q num Sets the Z-axis quantisation, i.e.	the number of  differ-
		     ent  colours  (or	intensities) in	which to render	Z-axis
		     values.   A  small	 number	  (e.g.	  4)   will   give   a
		     `poster'-like  effect  making it easier to	discern	magni-
		     tude bands	of similar level.  Small numbers also  usually
		     result  in	 small	PNG files.  The	number given specifies
		     the number	of colours to use inside the Z-axis range; two
		     colours are reserved to represent out-of-range values.

	      -w name
		     Window:  Hann  (default), Hamming,	Bartlett, Rectangular,
		     Kaiser or Dolph.  The spectrogram is produced  using  the
		     Discrete  Fourier	Transform (DFT)	algorithm.  A signifi-
		     cant parameter to this algorithm is the choice of `window
		     function'.	  By  default,	SoX uses the Hann window which
		     has good all-round	frequency-resolution and dynamic-range
		     properties.   For	better frequency resolution (but lower
		     dynamic-range), select a Hamming window; for  higher  dy-
		     namic-range  (but	poorer frequency-resolution), select a
		     Dolph window.  Kaiser, Bartlett and  Rectangular  windows
		     are also available.

	      -W num Window  adjustment	 parameter.   This can be used to make
		     small adjustments to the Kaiser or	Dolph window shape.  A
		     positive  number (up to ten) increases its	dynamic	range,
		     a negative	number decreases it.

	      -s     Allow slack overlapping of	DFT  windows.	This  can,  in
		     some cases, increase image	sharpness and give greater ad-
		     herence to	the -x value, but at the expense of  a	little
		     spectral loss.

	      -m     Creates a monochrome spectrogram (the default is colour).

	      -h     Selects  a	 high-colour  palette -	less visually pleasing
		     than the default colour palette, but it may make it  eas-
		     ier to differentiate different levels.  If	this option is
		     used in conjunction with -m, the result will be a	hybrid
		     monochrome/colour palette.

	      -p num Permute  the  colours in a	colour or hybrid palette.  The
		     num parameter, from 1 (the	default)  to  6,  selects  the

	      -l     Creates  a	 `printer  friendly'  spectrogram with a light
		     background	(the default has a dark	background).

	      -a     Suppress the display of the axis lines.   This  is	 some-
		     times useful in helping to	discern	artefacts at the spec-
		     trogram edges.

	      -r     Raw spectrogram: suppress the display of  axes  and  leg-

	      -A     Selects  an  alternative, fixed colour-set.  This is pro-
		     vided only	for compatibility with	spectrograms  produced
		     by	another	package.  It should not	normally be used as it
		     has some problems,	not least, a lack  of  differentiation
		     at	 the  bottom end which results in masking of low-level

	      -t text
		     Set the image title - text	to display above the  spectro-

	      -c text
		     Set  (or clear) the image comment - text to display below
		     and to the	left of	the spectrogram.

	      -o file
		     Name of the spectrogram output PNG	file,  default	`spec-
		     trogram.png'.   If	 `-' is	given, the spectrogram will be
		     sent to standard output (stdout).

	      Advanced Options:
	      In order to process a smaller section of audio without affecting
	      other  effects or	the output signal (unlike when the trim	effect
	      is used),	the following options may be used.

	      -d duration
		     This option sets the X-axis resolution  such  that	 audio
		     with  the	given duration (a time specification) fits the
		     selected (or default) X-axis width.  For example,
			sox input.mp3 output.wav -n spectrogram	-d 1:00	stats
		     creates a spectrogram showing the first minute of the au-
		     dio, whilst
		     the stats effect is applied to the	entire audio signal.

		     See  also -X for an alternative way of setting the	X-axis

	      -S position(=)
		     Start the spectrogram at the given	 point	in  the	 audio
		     stream.  For example
			sox input.aiff output.wav spectrogram -S 1:00
		     creates a spectrogram showing all but the first minute of
		     the audio (the output file, however, receives the	entire
		     audio stream).

	      For the ability to perform off-line processing of	spectral data,
	      see the stat effect.

       speed factor[c]
	      Adjust the audio speed (pitch and	tempo  together).   factor  is
	      either the ratio of the new speed	to the old speed: greater than
	      1	speeds up, less	than 1 slows down, or, if  appended  with  the
	      letter  `c',  the	number of cents	(i.e. 100ths of	a semitone) by
	      which the	pitch (and tempo) should be adjusted: greater  than  0
	      increases, less than 0 decreases.

	      Technically,  the	 speed effect only changes the sample rate in-
	      formation, leaving the samples themselves	untouched.   The  rate
	      effect is	invoked	automatically to resample to the output	sample
	      rate, using its default quality/speed.  For  higher  quality  or
	      higher  speed resampling,	in addition to the speed effect, spec-
	      ify the rate effect with the desired quality option.

	      See also the bend, pitch,	and tempo effects.

       splice  [-h|-t|-q] { position(=)[,excess[,leeway]] }
	      Splice together audio sections.  This effect provides two	things
	      over simple audio	concatenation: a (usually short) cross-fade is
	      applied at the join, and a wave similarity comparison is made to
	      help determine the best place at which to	make the join.

	      One of the options -h, -t, or -q may be given to select the fade
	      envelope as half-cosine wave (the	default),  triangular  (a.k.a.
	      linear), or quarter-cosine wave respectively.

		     Type   Audio	   Fade	level	    Transitions
		      t	    correlated	   constant gain    abrupt
		      h	    correlated	   constant gain    smooth
		      q	    uncorrelated   constant power   smooth

	      To perform a splice, first use the trim effect to	select the au-
	      dio sections to be joined	together.  As when performing  a  tape
	      splice,  the  end	 of  the  section to be	spliced	onto should be
	      trimmed with a small excess (default 0.005 seconds) of audio af-
	      ter the ideal joining point.  The	beginning of the audio section
	      to splice	on should be trimmed with the same excess (before  the
	      ideal  joining  point), plus an additional leeway	(default 0.005
	      seconds).	 Any time specification	may be used for	these  parame-
	      ters.  SoX should	then be	invoked	with the two audio sections as
	      input files and the splice effect	given  with  the  position  at
	      which  to	perform	the splice - this is length of the first audio
	      section (including the excess).

	      The following diagram uses the tape analogy  to  illustrate  the
	      splice  operation.   The	effect simulates the diagonal cuts and
	      joins the	two pieces:

		    length1   excess
		  _________   :	  :  _________________
			   \  :	  : :\	   `
			    \ :	  : : \	    `
			     \:	  : :  \     `
			      *	  : :	* - - *
			       \  : :	:\     `
				\ : :	: \	`
		  _______________\: :	:  \_____`____
				    :	:   :	  :
				    <--->   <----->
				    excess  leeway

	      where * indicates	the joining points.

	      For example, a long song begins with two verses which start  (as
	      determined  e.g. by using	the play command with the trim (start)
	      effect) at times 0:30.125	and 1:03.432.  The following  commands
	      cut out the first	verse:
		 sox too-long.wav part1.wav trim 0 30.130
	      (5 ms excess, after the first verse starts)
		 sox too-long.wav part2.wav trim 1:03.422
	      (5 ms excess plus	5 ms leeway, before the	second verse starts)
		 sox part1.wav part2.wav just-right.wav	splice 30.130
	      For another example, the SoX command
		 play "|sox -n -p synth	1 sin %1" "|sox	-n -p synth 1 sin %3"
	      generates	and plays two notes, but there is a nasty click	at the
	      transition; the click can	be removed by splicing instead of con-
	      catenating the audio, i.e. by appending splice 1 to the command.
	      (Clicks at the beginning and end of the audio can	be removed  by
	      preceding	the splice effect with fade q .01 2 .01).

	      Provided your arithmetic is good enough, multiple	splices	can be
	      performed	with a single splice invocation.  For example:
	      #	Audio Copy and Paste Over
	      #	acpo infile copy-start copy-stop paste-over-start outfile
	      #	No chained time	specifications allowed for the parameters
	      #	(i.e. such that	contain	+/-).
	      e=0.005			   # Using default excess
	      l=$e			   # and leeway.
	      sox "$1" piece.wav trim $2-$e-$l =$3+$e
	      sox "$1" part1.wav trim 0	$4+$e
	      sox "$1" part2.wav trim $4+$3-$2-$e-$l
	      sox part1.wav piece.wav part2.wav	"$5" \
		 splice	$4+$e +$3-$2+$e+$l+$e
	      In the above Bourne shell	script,	two splices are	used to	 `copy
	      and paste' audio.

				    *	     *	      *

	      It is also possible to use this effect to	perform	general	cross-
	      fades, e.g. to join two songs.  In this case, excess would typi-
	      cally  be	an number of seconds, the -q option would typically be
	      given (to	select an `equal power'	cross-fade), and leeway	should
	      be  zero (which is the default if	-q is given).  For example, if
	      f1.wav and f2.wav	are audio files	to be cross-faded, then
		 sox f1.wav f2.wav out.wav splice -q $(soxi -D f1.wav),3
	      cross-fades the files where the point of	equal  loudness	 is  3
	      seconds  before  the end of f1.wav, i.e. the total length	of the
	      cross-fade is 2 x	3 = 6 seconds (Note: the  $(...)  notation  is
	      POSIX shell).

       stat [-s	scale] [-rms] [-freq] [-v] [-d]
	      Display  time and	frequency domain statistical information about
	      the audio.  Audio	is passed unmodified through the SoX  process-
	      ing chain.

	      The  information	is  output  to	the  `standard error' (stderr)
	      stream and is calculated,	where n	is the duration	of  the	 audio
	      in  samples,  c  is the number of	audio channels,	r is the audio
	      sample rate, and xk represents the PCM value (in the range -1 to
	      +1  by  default) of each successive sample in the	audio, as fol-

	 Samples read	     nxc
	 Length	(seconds)    n/r
	 Scaled	by						See -s below.
	 Maximum amplitude   max(xk)				The maximum  sample
								value in the audio;
								usually	 this  will
								be  a positive num-
	 Minimum amplitude   min(xk)				The minimum  sample
								value in the audio;
								usually	 this  will
								be  a negative num-
	 Midline amplitude   1/2min(xk)+1/2max(xk)
	 Mean norm	     ^1/n<Sigma>|xk|			The average of	the
								absolute  value	 of
								each sample in	the
	 Mean amplitude	     ^1/n<Sigma>xk			The average of each
								sample in  the	au-
								dio.   If this fig-
								ure  is	  non-zero,
								then  it  indicates
								the presence  of  a
								D.C.  offset (which
								could  be   removed
								using  the  dcshift
	 RMS amplitude	     <sqrt>(^1/n<Sigma>xk^2)		The level of a D.C.
								signal	that  would
								have the same power
								as  the	audio's	av-
								erage power.
	 Maximum delta	     max(|xk-xk-1|)
	 Minimum delta	     min(|xk-xk-1|)
	 Mean delta	     ^1/n-1<Sigma>|xk-xk-1|
	 RMS delta	     <sqrt>(^1/n-1<Sigma>(xk-xk-1)^2)
	 Rough frequency					In Hz.

	 Volume	Adjustment					The  parameter	 to
								the    vol   effect
								which  would   make
								the  audio  as loud
								as possible without
								clipping.     Note:
								See the	 discussion
								on  Clipping  above
								for reasons why	 it
								is  rarely  a  good
								idea actually to do

	      Note  that  the delta measurements are not applicable for	multi-
	      channel audio.

	      The -s option can	be used	to scale the input  data  by  a	 given
	      factor.  The default value of scale is 2147483647	(i.e. the max-
	      imum value of a 32-bit signed integer).  Internal	effects	always
	      work with	signed long PCM	data and so the	value should relate to
	      this fact.

	      The -rms option will convert all output average values to	 `root
	      mean square' format.

	      The -v option displays only the `Volume Adjustment' value.

	      The  -freq  option  calculates  the input's power	spectrum (4096
	      point DFT) instead of the	statistics listed above.  This	should
	      only be used with	a single channel audio file.

	      The  -d option displays a	hex dump of the	32-bit signed PCM data
	      audio in SoX's internal buffer.  This is	mainly	used  to  help
	      track  down  endian problems that	sometimes occur	in cross-plat-
	      form versions of SoX.

	      See also the stats effect.

       stats [-b bits|-x bits|-s scale]	[-w window-time]
	      Display time domain  statistical	information  about  the	 audio
	      channels;	 audio is passed unmodified through the	SoX processing
	      chain.  Statistics are calculated	and displayed for  each	 audio
	      channel and, where applicable, an	overall	figure is also given.

	      For example, for a typical well-mastered stereo music file:

				       Overall	   Left	     Right
			  DC offset   0.000803 -0.000391  0.000803
			  Min level  -0.750977 -0.750977 -0.653412
			  Max level   0.708801	0.708801  0.653534
			  Pk lev dB	 -2.49	   -2.49     -3.69
			  RMS lev dB	-19.41	  -19.13    -19.71
			  RMS Pk dB	-13.82	  -13.82    -14.38
			  RMS Tr dB	-85.25	  -85.25    -82.66
			  Crest	factor	     -	    6.79      6.32
			  Flat factor	  0.00	    0.00      0.00
			  Pk count	     2	       2	 2
			  Bit-depth	 16/16	   16/16     16/16
			  Num samples	 7.72M
			  Length s     174.973
			  Scale	max   1.000000
			  Window s	 0.050

	      DC offset,  Min level,  and  Max level are shown,	by default, in
	      the range	+-1.  If the -b	(bits) options is  given,  then	 these
	      three  measurements  will	be scaled to a signed integer with the
	      given number of bits; for	example, for 16	bits, the scale	 would
	      be  -32768  to +32767.  The -x option behaves the	same way as -b
	      except that the signed integer values are	displayed in hexadeci-
	      mal.   The  -s  option  scales the three measurements by a given
	      floating-point number.

	      Pk lev dB	and RMS	lev dB are standard peak and  RMS  level  mea-
	      sured in dBFS.  RMS Pk dB	and RMS	Tr dB are peak and trough val-
	      ues for RMS level	measured over a	short window (default 50ms).

	      Crest factor is the standard ratio of peak to RMS	 level	(note:
	      not in dB).

	      Flat factor  is a	measure	of the flatness	(i.e. consecutive sam-
	      ples with	the same value)	of the signal at its peak levels (i.e.
	      either  Min level, or Max	level).	 Pk count is the number	of oc-
	      casions (not the number of samples) that the signal attained ei-
	      ther Min level, or Max level.

	      The  right-hand  Bit-depth  figure is the	standard definition of
	      bit-depth	i.e. bits less significant than	the given  number  are
	      fixed  at	zero.  The left-hand figure is the number of most sig-
	      nificant bits that are fixed at zero (or one for	negative  num-
	      bers)  subtracted	 from  the  right-hand figure (the number sub-
	      tracted is directly related to Pk	lev dB).

	      For multi-channel	audio, an overall figure for each of the above
	      measurements  is	given  and derived from	the channel figures as
	      follows: DC offset:  maximum  magnitude;	Max level,  Pk lev dB,
	      RMS Pk dB,  Bit-depth:  maximum;	Min level, RMS Tr dB: minimum;
	      RMS lev dB, Flat factor, Pk count:  average;  Crest factor:  not

	      Length s	is  the	duration in seconds of the audio, and Num sam-
	      ples  is	equal  to  the	sample-rate  multiplied	  by   Length.
	      Scale Max	 is  the  scaling  applied to the first	three measure-
	      ments; specifically, it is the maximum value that	could apply to
	      Max level.   Window s  is	 the length of the window used for the
	      peak and trough RMS measurements.

	      See also the stat	effect.

       swap   Swap stereo channels.  If	the input  is  not  stereo,  pairs  of
	      channels	are  swapped,  and  a possible odd last	channel	passed
	      through.	E.g., for seven	channels, the output order will	be  2,
	      1, 4, 3, 6, 5, 7.

	      See  also	 remix for an effect that allows arbitrary channel se-
	      lection and ordering (and	mixing).

       stretch factor [window fade shift fading]
	      Change the audio duration	(but not its pitch).  This  effect  is
	      broadly  equivalent  to  the  tempo effect with (factor inverted
	      and) search set to zero, so in general, its results are compara-
	      tively  poor;  it	 is  retained  as it can sometimes out-perform
	      tempo for	small factors.

	      factor of	stretching: >1 lengthen, <1 shorten duration.	window
	      size is in ms.  Default is 20ms.	The fade option, can be	`lin'.
	      shift ratio, in [0 1].  Default depends on stretch factor. 1  to
	      shorten,	0.8  to	 lengthen.  The	fading ratio, in [0 0.5].  The
	      amount of	a fade's default depends on factor and shift.

	      See also the tempo effect.

       synth [-j KEY] [-n] [len	[off [ph [p1 [p2 [p3]]]]]] {[type] [combine]
       [[%]freq[k][:|+|/|-[%]freq2[k]]]	[off [ph [p1 [p2 [p3]]]]]}
	      This effect can be used to generate fixed	or swept frequency au-
	      dio tones	with various wave shapes,  or  to  generate  wide-band
	      noise  of	various	`colours'.  Multiple synth effects can be cas-
	      caded to produce more complex waveforms; at  each	 stage	it  is
	      possible	to choose whether the generated	waveform will be mixed
	      with, or modulated onto the output from the previous stage.  Au-
	      dio  for	each channel in	a multi-channel	audio file can be syn-
	      thesised independently.

	      Though this effect is used to generate audio, an input file must
	      still be given, the characteristics of which will	be used	to set
	      the synthesised audio length, the	number of  channels,  and  the
	      sampling rate; however, since the	input file's audio is not nor-
	      mally needed, a `null file' (with	the special name -n) is	 often
	      given  instead (and the length specified as a parameter to synth
	      or by another given effect that has an associated	length).

	      For example, the following produces a  3	second,	 48kHz,	 audio
	      file containing a	sine-wave swept	from 300 to 3300 Hz:
		 sox -n	output.wav synth 3 sine	300-3300
	      and this produces	an 8 kHz version:
		 sox -r	8000 -n	output.wav synth 3 sine	300-3300
	      Multiple	channels  can  be synthesised by specifying the	set of
	      parameters shown between braces multiple	times;	the  following
	      puts  the	 swept tone in the left	channel	and adds `brown' noise
	      in the right:
		 sox -n	output.wav synth 3 sine	300-3300 brownnoise
	      The following example shows how two synth	effects	 can  be  cas-
	      caded to create a	more complex waveform:
		 play -n synth 0.5 sine	200-500	synth 0.5 sine fmod 700-100
	      Frequencies can also be given in `scientific' note notation, or,
	      by prefixing a `%' character, as a number	of semitones  relative
	      to  `middle  A'  (440 Hz).   For example,	the following could be
	      used to help tune	a guitar's low `E' string:
		 play -n synth 4 pluck %-29
	      or with a	(Bourne	shell) loop, the whole guitar:
		 for n in E2 A2	D3 G3 B3 E4; do
		   play	-n synth 4 pluck $n repeat 2; done
	      See the delay effect (above) and the reference to	`SoX scripting
	      examples'	(below)	for more synth examples.

	      N.B.   This  effect  generates  audio at maximum volume (0dBFS),
	      which means that there is	a high chance of clipping  when	 using
	      the  audio subsequently, so in many cases, you will want to fol-
	      low this effect with the gain effect to prevent this  from  hap-
	      pening.  (See  also Clipping above.)  Note that, by default, the
	      synth effect incorporates	the functionality of gain -h (see  the
	      gain effect for details);	synth's	-n option may be given to dis-
	      able this	behaviour.

	      A	detailed description of	each synth parameter follows:

	      len is the length	of audio to synthesise	(any  time  specifica-
	      tion);  a	value of 0 indicated to	use the	input length, which is
	      also the default.

	      type is one of sine, square, triangle, sawtooth, trapezium, exp,
	      [white]noise,   tpdfnoise,  pinknoise,  brownnoise,  pluck;  de-

	      combine is one of	create,	mix, amod (amplitude modulation), fmod
	      (frequency modulation); default=create.

	      freq/freq2 are the frequencies at	the beginning/end of synthesis
	      in Hz  or,  if  preceded	with  `%',  semitones  relative	 to  A
	      (440 Hz);	 alternatively,	 `scientific'  note notation (e.g. E2)
	      may be used.  The	default	frequency is 440Hz.  By	 default,  the
	      tuning  used with	the note notations is `equal temperament'; the
	      -j KEY option selects `just intonation', where KEY is an integer
	      number  of  semitones relative to	A (so for example, -9 or 3 se-
	      lects the	key of C), or a	note in	scientific notation.

	      If freq2 is given, then len must also have been  given  and  the
	      generated	tone will be swept between the given frequencies.  The
	      two given	frequencies must be separated by one of	the characters
	      `:',  `+',  `/',	or `-'.	 This character	is used	to specify the
	      sweep function as	follows:

	      :	     Linear: the tone will change by a fixed number  of	 hertz
		     per second.

	      +	     Square:  a	 second-order  function	 is used to change the

	      /	     Exponential: the tone will	change by a  fixed  number  of
		     semitones per second.

	      -	     Exponential:  as  `/', but	initial	phase always zero, and
		     stepped (less smooth) frequency changes.

	      Not used for noise.

	      off is the bias (DC-offset) of the signal	in percent; default=0.

	      ph is the	phase shift in percentage of 1 cycle; default=0.   Not
	      used for noise.

	      p1  is  the  percentage  of each cycle that is `on' (square), or
	      `rising' (triangle, exp, trapezium); default=50 (square,	trian-
	      gle,  exp),  default=10  (trapezium),  or	 sustain  (pluck); de-

	      p2 (trapezium): the  percentage  through	each  cycle  at	 which
	      `falling'	begins;	default=50. exp: the amplitude in multiples of
	      2dB; default=50, or tone-1 (pluck); default=20.

	      p3 (trapezium): the  percentage  through	each  cycle  at	 which
	      `falling'	ends; default=60, or tone-2 (pluck); default=90.

       tempo [-q] [-m|-s|-l] factor [segment [search [overlap]]]
	      Change  the  audio playback speed	but not	its pitch. This	effect
	      uses the WSOLA algorithm.	The audio is chopped up	into  segments
	      which are	then shifted in	the time domain	and overlapped (cross-
	      faded) at	points where their waveforms are most similar  as  de-
	      termined by measurement of `least	squares'.

	      By  default,  linear searches are	used to	find the best overlap-
	      ping points.  If	the  optional  -q  parameter  is  given,  tree
	      searches	are  used  instead.  This  makes  the effect work more
	      quickly, but the result may not sound as good. However,  if  you
	      must  improve  the  processing speed, this generally reduces the
	      sound quality less than reducing the search or overlap values.

	      The -m option is used to optimize	 default  values  of  segment,
	      search and overlap for music processing.

	      The  -s  option  is  used	to optimize default values of segment,
	      search and overlap for speech processing.

	      The -l option is used to optimize	 default  values  of  segment,
	      search  and  overlap for `linear'	processing that	tends to cause
	      more noticeable distortion but may  be  useful  when  factor  is
	      close to 1.

	      If -m, -s, or -l is specified, the default value of segment will
	      be calculated based on factor, while default search and  overlap
	      values  are based	on segment. Any	values you provide still over-
	      ride these default values.

	      factor gives the ratio of	new tempo to the old  tempo,  so  e.g.
	      1.1 speeds up the	tempo by 10%, and 0.9 slows it down by 10%.

	      The  optional  segment parameter selects the algorithm's segment
	      size in milliseconds.  If	no other flags are specified, the  de-
	      fault  value  is	82  and	 is  typically	suited to making small
	      changes to the tempo of music. For larger	changes	(e.g. a	factor
	      of 2), 41	ms may give a better result.  The -m, -s, and -l flags
	      will cause the segment  default  to  be  automatically  adjusted
	      based on factor.	For example using -s (for speech) with a tempo
	      of 1.25 will calculate a default segment value of	32.

	      The optional search parameter gives the  audio  length  in  mil-
	      liseconds	 over  which the algorithm will	search for overlapping
	      points.  If no other flags are specified,	the default  value  is
	      14.68.   Larger  values  use more	processing time	and may	or may
	      not produce better results.  A practical	maximum	 is  half  the
	      value  of	 segment. Search can be	reduced	to cut processing time
	      at the risk of degrading output quality.	The  -m,  -s,  and  -l
	      flags will cause the search default to be	automatically adjusted
	      based on segment.

	      The optional overlap parameter gives the segment overlap	length
	      in  milliseconds.	  Default value	is 12, but -m, -s, or -l flags
	      automatically adjust overlap based on segment  size.  Increasing
	      overlap  increases  processing  time and may increase quality. A
	      practical	maximum	for overlap is the value of search, with over-
	      lap typically being (at least) a little smaller then search.

	      See  also	 speed	for an effect that changes tempo and pitch to-
	      gether, pitch and	bend for effects that change pitch  only,  and
	      stretch for an effect that changes tempo using a different algo-

       treble gain [frequency[k] [width[s|h|k|o|q]]]
	      Apply a treble tone-control effect.  See the description of  the
	      bass effect for details.

       tremolo speed [depth]
	      Apply  a	tremolo	(low frequency amplitude modulation) effect to
	      the audio.  The tremolo frequency	in Hz is given by  speed,  and
	      the depth	as a percentage	by depth (default 40).

       trim {position(+)}
	      Cuts  portions out of the	audio.	Any number of positions	may be
	      given; audio is not sent to the output until the first  position
	      is reached.  The effect then alternates between copying and dis-
	      carding audio at each position.  Using a	value  of  0  for  the
	      first  position  parameter  allows copying from the beginning of
	      the audio.

	      For example,
		 sox infile outfile trim 0 10
	      will copy	the first ten seconds, while
		 play infile trim 12:34	=15:00 -2:00
		 play infile trim 12:34	2:26 -2:00
	      will both	play from 12 minutes 34	seconds	into the audio	up  to
	      15  minutes into the audio (i.e. 2 minutes and 26	seconds	long),
	      then resume playing two minutes before the end of	audio.

       upsample	[factor]
	      Upsample the signal by an	integer	 factor:  factor-1  zero-value
	      samples  are  inserted between each pair of input	samples.  As a
	      result, the original spectrum is replicated into	the  new  fre-
	      quency  space (imaging) and attenuated.  This attenuation	can be
	      compensated for by adding	vol factor after any further  process-
	      ing.   The upsample effect is typically used in combination with
	      filtering	effects.

	      For a general resampling effect  with  anti-imaging,  see	 rate.
	      See also downsample.

       vad [options]
	      Voice  Activity  Detector.   Attempts  to	trim silence and quiet
	      background sounds	from the ends of (fairly high resolution  i.e.
	      16-bit, 44-48kHz)	recordings of speech.  The algorithm currently
	      uses a simple cepstral power measurement to detect voice,	so may
	      be  fooled  by  other  things, especially	music.	The effect can
	      trim only	from the front of the audio, so	in order to trim  from
	      the back,	the reverse effect must	also be	used.  E.g.
		 play speech.wav norm vad
	      to trim from the front,
		 play speech.wav norm reverse vad reverse
	      to trim from the back, and
		 play speech.wav norm vad reverse vad reverse
	      to  trim	from  both ends.  The use of the norm effect is	recom-
	      mended, but remember that	neither	reverse	nor norm  is  suitable
	      for use with streamed audio.

	      Default values are shown in parenthesis.

	      -t num (7)
		     The measurement level used	to trigger activity detection.
		     This might	need to	be  changed  depending	on  the	 noise
		     level,  signal level and other charactistics of the input

	      -T num (0.25)
		     The time constant (in seconds) used to help ignore	 short
		     bursts of sound.

	      -s num (1)
		     The  amount  of  audio  (in  seconds)  to search for qui-
		     eter/shorter bursts of audio to include prior to the  de-
		     tected trigger point.

	      -g num (0.25)
		     Allowed  gap  (in seconds)	between	quieter/shorter	bursts
		     of	audio to include prior to the detected trigger point.

	      -p num (0)
		     The amount	of audio (in seconds) to preserve  before  the
		     trigger point and any found quieter/shorter bursts.

	      Advanced Options:
	      These allow fine tuning of the algorithm's internal parameters.

	      -b num The  algorithm  (internally)  uses	adaptive noise estima-
		     tion/reduction in order to	detect the start of the	wanted
		     audio.   This  option sets	the time for the initial noise

	      -N num Time constant used	by the adaptive	 noise	estimator  for
		     when the noise level is increasing.

	      -n num Time  constant  used  by the adaptive noise estimator for
		     when the noise level is decreasing.

	      -r num Amount of noise reduction to use in the  detection	 algo-
		     rithm (e.g. 0, 0.5, ...).

	      -f num Frequency of the algorithm's processing/measurements.

	      -m num Measurement  duration;  by	default, twice the measurement
		     period; i.e.  with	overlap.

	      -M num Time constant used	to smooth spectral measurements.

	      -h num `Brick-wall' frequency of high-pass filter	applied	at the
		     input to the detector algorithm.

	      -l num `Brick-wall'  frequency of	low-pass filter	applied	at the
		     input to the detector algorithm.

	      -H num `Brick-wall' frequency of high-pass lifter	 used  in  the
		     detector algorithm.

	      -L num `Brick-wall' frequency of low-pass	lifter used in the de-
		     tector algorithm.

	      See also the silence effect.

       vol gain	[type [limitergain]]
	      Apply an amplification or	an attenuation to  the	audio  signal.
	      Unlike the -v option (which is used for balancing	multiple input
	      files as they enter the SoX effects processing chain), vol is an
	      effect  like  any	 other so can be applied anywhere, and several
	      times if necessary, during the processing	chain.

	      The amount to change the volume is given by gain which is	inter-
	      preted,  according to the	given type, as follows:	if type	is am-
	      plitude (or is omitted), then gain is an amplitude (i.e. voltage
	      or  linear) ratio, if power, then	a power	(i.e. wattage or volt-
	      age-squared) ratio, and if dB, then a power change in dB.

	      When type	is amplitude or	power, a gain of 1 leaves  the	volume
	      unchanged,  less	than  1	 decreases  it,	and greater than 1 in-
	      creases it; a negative gain inverts the audio signal in addition
	      to adjusting its volume.

	      When  type  is dB, a gain	of 0 leaves the	volume unchanged, less
	      than 0 decreases it, and greater than 0 increases	it.

	      See [4] for a detailed discussion	on electrical (and hence audio
	      signal) voltage and power	ratios.

	      Beware of	Clipping when the increasing the volume.

	      The gain and the type parameters can be concatenated if desired,
	      e.g.  vol	10dB.

	      An optional limitergain value can	be specified and should	 be  a
	      value  much  less	than 1 (e.g. 0.05 or 0.02) and is used only on
	      peaks to prevent clipping.  Not specifying this  parameter  will
	      cause  no	limiter	to be used.  In	verbose	mode, this effect will
	      display the percentage of	the audio that needed to be limited.

	      See also gain for	a volume-changing effect with different	 capa-
	      bilities,	 and  compand  for  a dynamic-range compression/expan-
	      sion/limiting effect.

       Exit status is 0	for no error, 1	if there is a problem  with  the  com-
       mand-line parameters, or	2 if an	error occurs during file processing.

       Please report any bugs found in this version of SoX to the mailing list

       soxi(1),	soxformat(7), libsox(3)
       audacity(1), gnuplot(1),	octave(1), wget(1)
       The SoX web site	at
       SoX scripting examples at

       [1]    R. Bristow-Johnson, Cookbook formulae for	audio EQ biquad	filter

       [2]    Wikipedia, Q-factor,

       [3]    Scott  Lehman, Effects Explained,

       [4]    Wikipedia, Decibel,

       [5]    Richard  Furse,  Linux  Audio  Developer's  Simple  Plugin  API,

       [6]    Richard Furse, Computer Music Toolkit,

       [7]    Steve Harris, LADSPA plugins,

       Copyright 1998-2013 Chris Bagwell and SoX Contributors.
       Copyright 1991 Lance Norskog and	Sundry Contributors.

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published  by  the
       Free  Software  Foundation;  either  version 2, or (at your option) any
       later version.

       This program is distributed in the hope that it	will  be  useful,  but
       WITHOUT	ANY  WARRANTY;	without	 even  the  implied  warranty  of MER-
       Public License for more details.

       Chris Bagwell (	Other authors and con-
       tributors are listed in the ChangeLog file that is distributed with the
       source code.

sox			       December	31, 2014			SoX(1)


Want to link to this manual page? Use this URL:

home | help