Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
POCKETSPHINX(1)		    General Commands Manual	       POCKETSPHINX(1)

NAME
       pocketsphinx - Run speech recognition on	audio data

SYNOPSIS
       pocketsphinx  [	options...  ]  [ live |	single | help |	soxflags ] IN-
       PUTS...

DESCRIPTION
       The pocketsphinx	command-line program reads single-channel  16-bit  PCM
       audio  one  or more input files (or - to	read from standard input), and
       attempts	to recognize speech in it using	the default acoustic and  lan-
       guage  model.  The  input  files	 can be	raw audio, WAV,	or NIST	Sphere
       files, though some of these may not be recognized properly.  It accepts
       a large number of options which you probably don't care	about,	and  a
       command which defaults to live. The commands are	as follows:

       help   Print a long list	of those options you don't care	about.

       config Dump  configuration  as  JSON  to	standard output	(can be	loaded
	      with the -config option).

       live   Detect speech segments in	input files, run recognition  on  them
	      (using  those  options  you don't	care about), and write the re-
	      sults to standard	output in line-delimited JSON. I realize  this
	      isn't  the  prettiest  format,  but it sure beats	XML. Each line
	      contains a JSON object with these	fields,	which have short names
	      to make the lines	more readable:

	      "b": Start time in seconds, from the beginning of	the stream

	      "d": Duration in seconds

	      "p": Estimated probability of the	 recognition  result,  i.e.  a
	      number between 0 and 1 which may be used as a confidence score

	      "t": Full	text of	recognition result

	      "w":  List  of  segments	(usually words), each of which in turn
	      contains the b, d, p, and	t fields, for start, end, probability,
	      and the text of the word.	In the future we may also support  hi-
	      erarchical results in which case w could be present.

       single Recognize	 the input as a	single utterance, and write a JSON ob-
	      ject in the same format described	above.

       align

	      Align a single input file	(or - for standard input)  to  a  word
	      sequence,	 and  write a JSON object in the same format described
	      above.  The first	positional argument is the input, and all sub-
	      sequent ones are concatenated to make the	text,  to  avoid  sur-
	      prises  if you forget to quote it.  You are responsible for nor-
	      malizing the text	to remove punctuation, uppercase,  centipedes,
	      etc. For example:

		  pocketsphinx align goforward.wav "go forward ten meters"

	      By  default,  only  word-level  alignment	is done.  To get phone
	      alignments, pass `-phone_align yes` in the flags,	e.g.:

		  pocketsphinx -phone_align yes	align audio.wav	$text

	      This will	make not particularly readable output, but you can use
	      jq (https://stedolan.github.io/jq/) to clean it up.   For	 exam-
	      ple, you can get just the	word names and start times like	this:

		  pocketsphinx align audio.wav $text | jq '.w[]|[.t,.b]'

	      Or you could get the phone names and durations like this:

		  pocketsphinx -phone_align yes	align audio.wav	$text |	jq '.w[]|.w[]|[.t,.d]'

	      There are	many, many other possibilities,	of course.

       help   Print a usage and	help text with a list of possible arguments.

       soxflags
	      Return  arguments	to sox which will create the appropriate input
	      format. Note that	 because  the  sox  command-line  is  slightly
	      quirky  these  must  always come after the filename or -d	(which
	      tells sox	to read	from the microphone). You can run live	recog-
	      nition like this:

		  sox -d $(pocketsphinx	soxflags) | pocketsphinx -

	      or decode	from a file named "audio.mp3" like this:

		  sox audio.mp3	$(pocketsphinx soxflags) | pocketsphinx	-

       By  default  only errors	are printed to standard	error, but if you want
       more information	you can	pass -loglevel INFO. Partial results  are  not
       printed,	 maybe they will be in the future, but don't hold your breath.
       Force-alignment is likely to be supported soon, however.

OPTIONS
       -agc   Automatic	gain  control  for  c0	('max',	 'emax',  'noise',  or
	      'none')

       -agcthresh
	      Initial threshold	for automatic gain control

       -allphone
	      phoneme decoding with phonetic lm	(given here)

       -allphone_ci
	      Perform  phoneme	decoding with phonetic lm and context-indepen-
	      dent units only

       -alpha Preemphasis parameter

       -ascale
	      Inverse of acoustic model	scale for confidence score calculation

       -aw    Inverse weight applied to	acoustic scores.

       -backtrace
	      Print results and	backtraces to log.

       -beam  Beam width applied to every frame	 in  Viterbi  search  (smaller
	      values mean wider	beam)

       -bestpath
	      Run bestpath (Dijkstra) search over word lattice (3rd pass)

       -bestpathlw
	      Language model probability weight	for bestpath search

       -ceplen
	      Number of	components in the input	feature	vector

       -cmn   Cepstral mean normalization scheme ('live', 'batch', or 'none')

       -cmninit
	      Initial  values  (comma-separated) for cepstral mean when	'live'
	      is used

       -compallsen
	      Compute all senone scores	in every frame	(can  be  faster  when
	      there are	many senones)

       -dict  pronunciation dictionary (lexicon) input file

       -dictcase
	      Dictionary  is  case sensitive (NOTE: case insensitivity applies
	      to ASCII characters only)

       -dither
	      Add 1/2-bit noise

       -doublebw
	      Use double bandwidth filters (same center	freq)

       -ds    Frame GMM	computation downsampling ratio

       -fdict word pronunciation dictionary input file

       -feat  Feature stream type, depends on the acoustic model

       -featparams
	      containing feature extraction parameters.

       -fillprob
	      Filler word transition probability

       -frate Frame rate

       -fsg   format finite state grammar file

       -fsgusealtpron
	      Add alternate pronunciations to FSG

       -fsgusefiller
	      Insert filler words at each state.

       -fwdflat
	      Run forward flat-lexicon search over word	lattice	(2nd pass)

       -fwdflatbeam
	      Beam width applied to every frame	in second-pass flat search

       -fwdflatefwid
	      Minimum number of	end frames for a word to be searched  in  fwd-
	      flat search

       -fwdflatlw
	      Language	model  probability  weight for flat lexicon (2nd pass)
	      decoding

       -fwdflatsfwin
	      Window of	frames in lattice to search  for  successor  words  in
	      fwdflat search

       -fwdflatwbeam
	      Beam width applied to word exits in second-pass flat search

       -fwdtree
	      Run forward lexicon-tree search (1st pass)

       -hmm   containing acoustic model	files.

       -input_endian
	      Endianness  of  input data, big or little, ignored if NIST or MS
	      Wav

       -jsgf  grammar file

       -keyphrase
	      to spot

       -kws   file with	keyphrases to spot, one	per line

       -kws_delay
	      Delay to wait for	best detection score

       -kws_plp
	      Phone loop probability for keyphrase spotting

       -kws_threshold
	      Threshold	for p(hyp)/p(alternatives) ratio

       -latsize
	      Initial backpointer table	size

       -lda   containing transformation	matrix to be applied to	features (sin-
	      gle-stream features only)

       -ldadim
	      Dimensionality of	output of feature transformation (0 to use en-
	      tire matrix)

       -lifter
	      Length of	sin-curve for liftering, or 0 for no liftering.

       -lm    trigram language model input file

       -lmctl a	set of language	model

       -lmname
	      language model in	-lmctl to use by default

       -logbase
	      Base in which all	log-likelihoods	calculated

       -logfn to write log messages in

       -loglevel
	      Minimum level of log messages (DEBUG, INFO, WARN,	ERROR)

       -logspec
	      Write out	logspectral files instead of cepstra

       -lowerf
	      Lower edge of filters

       -lpbeam
	      Beam width applied to last phone in words

       -lponlybeam
	      Beam width applied to last phone in single-phone words

       -lw    Language model probability weight

       -maxhmmpf
	      Maximum number of	active HMMs to maintain	at each	frame  (or  -1
	      for no pruning)

       -maxwpf
	      Maximum  number  of distinct word	exits at each frame (or	-1 for
	      no pruning)

       -mdef  definition input file

       -mean  gaussian means input file

       -mfclogdir
	      to log feature files to

       -min_endfr
	      Nodes ignored in lattice construction if they persist for	 fewer
	      than N frames

       -mixw  mixture weights input file (uncompressed)

       -mixwfloor
	      Senone mixture weights floor (applied to data from -mixw file)

       -mllr  transformation to	apply to means and variances

       -mmap  Use memory-mapped	I/O (if	possible) for model files

       -ncep  Number of	cep coefficients

       -nfft  Size of FFT, or 0	to set automatically (recommended)

       -nfilt Number of	filter banks

       -nwpen New word transition penalty

       -pbeam Beam width applied to phone transitions

       -pip   Phone insertion penalty

       -pl_beam
	      Beam width applied to phone loop search for lookahead

       -pl_pbeam
	      Beam width applied to phone loop transitions for lookahead

       -pl_pip
	      Phone insertion penalty for phone	loop

       -pl_weight
	      Weight for phoneme lookahead penalties

       -pl_window
	      Phoneme lookahead	window size, in	frames

       -rawlogdir
	      to log raw audio files to

       -remove_dc
	      Remove DC	offset from each frame

       -remove_noise
	      Remove noise using spectral subtraction

       -round_filters
	      Round mel	filter frequencies to DFT points

       -samprate
	      Sampling rate

       -seed  Seed  for	 random	 number	generator; if less than	zero, pick our
	      own

       -sendump
	      dump (compressed mixture weights)	input file

       -senlogdir
	      to log senone score files	to

       -senmgau
	      to codebook mapping input	file (usually not needed)

       -silprob
	      Silence word transition probability

       -smoothspec
	      Write out	cepstral-smoothed logspectral files

       -svspec
	      specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)

       -tmat  state transition matrix input file

       -tmatfloor
	      HMM state	transition probability floor (applied to -tmat file)

       -topn  Maximum number of	top Gaussians to use in	scoring.

       -topn_beam
	      Beam width used to determine top-N Gaussians (or	a  list,  per-
	      feature)

       -toprule
	      rule for JSGF (first public rule is default)

       -transform
	      Which  type  of  transform  to use to calculate cepstra (legacy,
	      dct, or htk)

       -unit_area
	      Normalize	mel filters to unit area

       -upperf
	      Upper edge of filters

       -uw    Unigram weight

       -var   gaussian variances input file

       -varfloor
	      Mixture gaussian variance	floor (applied to data from -var file)

       -varnorm
	      Variance normalize each utterance	(only if CMN ==	current)

       -verbose
	      Show input filenames

       -warp_params
	      defining the warping function

       -warp_type
	      Warping function type (or	shape)

       -wbeam Beam width applied to word exits

       -wip   Word insertion penalty

       -wlen  Hamming window length

AUTHOR
       Written by numerous people at CMU from 1994 onwards.  This manual  page
       by David	Huggins-Daines <dhdaines@gmail.com>

COPYRIGHT
       Copyright  (C)  1994-2016 Carnegie Mellon University.  See the file LI-
       CENSE included with this	package	for more information.

SEE ALSO
       pocketsphinx_batch(1), sphinx_fe(1).

				  2022-09-27		       POCKETSPHINX(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=pocketsphinx&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help