Scroll to navigation

SPHINX_FE(1) General Commands Manual SPHINX_FE(1)


sphinx_fe - Convert audio files to acoustic feature files


sphinx_fe [ options ]...


This program converts audio files (in either Microsoft WAV, NIST Sphere, or raw format) to acoustic feature files for input to batch-mode speech recognition. The resulting files are also useful for various other things. A list of options follows:

Preemphasis parameter
file (e.g. feat.params from an acoustic model) to read parameters from. This will override anything set in other command line arguments.
Number of samples to read at a time.
Create missing subdirectories in output directory
file for batch processing
Input is cepstral files, output is log spectral files
directory, input file names are relative to this, if defined
Add 1/2-bit noise
directory, output files are relative to this
Use double bandwidth filters (same center freq)
extension to be applied to all input files
extension to be applied to all output files
Shows example of how to use the tool
Frame rate
Shows the usage of the tool
audio input file
Endianness of input data, big or little, ignored if NIST or MS Wav
Length of sin-curve for liftering, or 0 for no liftering.
Write out logspectral files instead of cepstra
Lower edge of filters
Endianness of machine, big or little
Defines input format as Microsoft Wav (RIFF)
Number of cep coefficients
Number of channels of data (interlaced samples assumed)
Size of FFT
Number of filter banks
Defines input format as NIST sphere
Number of parts to run in (supersedes -nskip and -runlen if non-zero)
If a control file was specified, the number of utterances to skip at the head of the file
cepstral output file
Format of output files - one of sphinx, htk, text.
Index of the part to run (supersedes -nskip and -runlen if non-zero)
Defines input format as raw binary data
Remove DC offset from each frame
Remove noise with spectral subtraction in mel-energies
Enables VAD, removes silence frames from processing
Round mel filter frequencies to DFT points
If a control file was specified, the number of utterances to process, or -1 for all
Sampling rate
Seed for random number generator; if less than zero, pick our own
Write out cepstral-smoothed logspectral files
Input is log spectral files, output is cepstral files
Input is NIST sphere (possibly with Shorten), use sph2pipe to convert
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
Normalize mel filters to unit area
Upper edge of filters
Num of silence frames to keep after from speech to silence.
Num of speech frames to keep before silence to speech.
Num of speech frames to trigger vad from silence to speech.
Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
Show input filenames
defining the warping function
Warping function type (or shape)
Channel to process (numbered from 1), or 0 to mix all channels
Hamming window length

Currently the only kind of features supported are MFCCs (mel-frequency cepstral coefficients). There are numerous options which control the properties of the output features. It is VERY important that you document the specific set of flags used to create any given set of feature files, since this information is NOT recorded in the files themselves, and any mismatch between the parameters used to extract features for recognition and those used to extract features for training will cause recognition to fail.


Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <>


Copyright © 1994-2007 Carnegie Mellon University. See the file COPYING included with this package for more information.