Scroll to navigation

POCKETSPHINX_CONTINUOUS(1) General Commands Manual POCKETSPHINX_CONTINUOUS(1)

NAME

pocketsphinx_continuous - Run speech recognition in continuous listening mode

SYNOPSIS

pocketsphinx_continuous [-infile filename.wav ] [ -inmic yes ] [ options ]...

DESCRIPTION

This program opens the audio device or a file and waits for speech. When it detects an utterance, it performs speech recognition on it.

To record from microphone and decode use

-inmic yes

To decode a 16kHz 16-bit mono WAV file use

-infile filename.wav

You can also specify -lm or -fsg or -kws depending on whether you are using a statistical language model or a finite-state grammar or look for a keyphase.

OPTIONS

-adcdev
of audio device to use for input.
-agc
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh
Initial threshold for automatic gain control
-allphone
phoneme decoding with phonetic lm
-allphone_ci
Perform phoneme decoding with phonetic lm and context-independent units only
-alpha
Preemphasis parameter
-argfile
file giving extra arguments.
-ascale
Inverse of acoustic model scale for confidence score calculation
-aw
Inverse weight applied to acoustic scores.
-backtrace
Print results and backtraces to log file.
-beam
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
-bestpathlw
Language model probability weight for bestpath search
-ceplen
Number of components in the input feature vector
-cmn
Cepstral mean normalization scheme ('current', 'prior', or 'none')
-cmninit
Initial values (comma-separated) for cepstral mean when 'prior' is used
-compallsen
Compute all senone scores in every frame (can be faster when there are many senones)
-debug
level for debugging messages
-dict
pronunciation dictionary (lexicon) input file
-dictcase
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
-dither
Add 1/2-bit noise
-doublebw
Use double bandwidth filters (same center freq)
-ds
Frame GMM computation downsampling ratio
-fdict
word pronunciation dictionary input file
-feat
Feature stream type, depends on the acoustic model
-featparams
containing feature extraction parameters.
-fillprob
Filler word transition probability
-frate
Frame rate
-fsg
format finite state grammar file
-fsgusealtpron
Add alternate pronunciations to FSG
-fsgusefiller
Insert filler words at each state.
-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
-fwdflatbeam
Beam width applied to every frame in second-pass flat search
-fwdflatefwid
Minimum number of end frames for a word to be searched in fwdflat search
-fwdflatlw
Language model probability weight for flat lexicon (2nd pass) decoding
-fwdflatsfwin
Window of frames in lattice to search for successor words in fwdflat search
-fwdflatwbeam
Beam width applied to word exits in second-pass flat search
-fwdtree
Run forward lexicon-tree search (1st pass)
-hmm
containing acoustic model files.
-infile
file to transcribe.
-inmic
Transcribe audio from microphone.
-input_endian
Endianness of input data, big or little, ignored if NIST or MS Wav
-jsgf
grammar file
-keyphrase
to spot
-kws
file with keyphrases to spot, one per line
-kws_delay
Delay to wait for best detection score
-kws_plp
Phone loop probability for keyword spotting
-kws_threshold
Threshold for p(hyp)/p(alternatives) ratio
-latsize
Initial backpointer table size
-lda
containing transformation matrix to be applied to features (single-stream features only)
-ldadim
Dimensionality of output of feature transformation (0 to use entire matrix)
-lifter
Length of sin-curve for liftering, or 0 for no liftering.
-lm
trigram language model input file
-lmctl
a set of language model
-lmname
language model in -lmctl to use by default
-logbase
Base in which all log-likelihoods calculated
-logfn
to write log messages in
-logspec
Write out logspectral files instead of cepstra
-lowerf
Lower edge of filters
-lpbeam
Beam width applied to last phone in words
-lponlybeam
Beam width applied to last phone in single-phone words
-lw
Language model probability weight
-maxhmmpf
Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)
-maxwpf
Maximum number of distinct word exits at each frame (or -1 for no pruning)
-mdef
definition input file
-mean
gaussian means input file
-mfclogdir
to log feature files to
-min_endfr
Nodes ignored in lattice construction if they persist for fewer than N frames
-mixw
mixture weights input file (uncompressed)
-mixwfloor
Senone mixture weights floor (applied to data from -mixw file)
-mllr
transformation to apply to means and variances
-mmap
Use memory-mapped I/O (if possible) for model files
-ncep
Number of cep coefficients
-nfft
Size of FFT
-nfilt
Number of filter banks
-nwpen
New word transition penalty
-pbeam
Beam width applied to phone transitions
-pip
Phone insertion penalty
-pl_beam
Beam width applied to phone loop search for lookahead
-pl_pbeam
Beam width applied to phone loop transitions for lookahead
-pl_pip
Phone insertion penalty for phone loop
-pl_weight
Weight for phoneme lookahead penalties
-pl_window
Phoneme lookahead window size, in frames
-rawlogdir
to log raw audio files to
-remove_dc
Remove DC offset from each frame
-remove_noise
Remove noise with spectral subtraction in mel-energies
-remove_silence
Enables VAD, removes silence frames from processing
-round_filters
Round mel filter frequencies to DFT points
-samprate
Sampling rate
-seed
Seed for random number generator; if less than zero, pick our own
-sendump
dump (compressed mixture weights) input file
-senlogdir
to log senone score files to
-senmgau
to codebook mapping input file (usually not needed)
-silprob
Silence word transition probability
-smoothspec
Write out cepstral-smoothed logspectral files
-svspec
specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
-time
Print word times in file transcription.
-tmat
state transition matrix input file
-tmatfloor
HMM state transition probability floor (applied to -tmat file)
-topn
Maximum number of top Gaussians to use in scoring.
-topn_beam
Beam width used to determine top-N Gaussians (or a list, per-feature)
-toprule
rule for JSGF (first public rule is default)
-transform
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
-unit_area
Normalize mel filters to unit area
-upperf
Upper edge of filters
-uw
Unigram weight
-vad_postspeech
Num of silence frames to keep after from speech to silence.
-vad_prespeech
Num of speech frames to keep before silence to speech.
-vad_startspeech
Num of speech frames to trigger vad from silence to speech.
-vad_threshold
Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
-var
gaussian variances input file
-varfloor
Mixture gaussian variance floor (applied to data from -var file)
-varnorm
Variance normalize each utterance (only if CMN == current)
-verbose
Show input filenames
-warp_params
defining the warping function
-warp_type
Warping function type (or shape)
-wbeam
Beam width applied to word exits
-wip
Word insertion penalty
-wlen
Hamming window length

AUTHOR

Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <dhuggins@cs.cmu.edu>

COPYRIGHT

Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE included with this package for more information.

SEE ALSO

pocketsphinx_batch(1), sphinx_fe(1).
2016-04-01