kaldi.feat¶
kaldi.feat.fbank¶
Classes
Fbank |
Filterbank computer. |
FbankComputer |
Fiterbank computer. |
FbankOptions |
Options for computing filterbank features. |
-
class
kaldi.feat.fbank.
Fbank
¶ Filterbank computer.
Parameters: opts (FbankOptions) – Options for computing filterbank features. -
compute
(wave:VectorBase, vtln_warp:float) → Matrix¶ Computes the filterbank features from input waveform.
This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
compute_features
(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix¶ Computes the filterbank features from input waveform.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
dim
() → int¶ Returns the feature dimension.
-
from_other
(other:Fbank) → Fbank¶ Constructs a new Fbank object from another.
-
-
class
kaldi.feat.fbank.
FbankComputer
¶ Fiterbank computer.
This is the low-level interface for computing filterbank features.
Parameters: opts (FbankOptions) – Options for computing filterbank features. -
compute
(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)¶ Computes one feature frame from one signal frame.
Parameters: - signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
need_raw_log_energy()
returns False. - vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VLTN, such as spectrogram features.
- signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
- feature (Vector) – Output frame of features.
- signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
-
dim
() → int¶ Returns feature dimension.
-
from_other
(other:FbankComputer) → FbankComputer¶ Constructs a new FbankComputer object from another.
-
get_frame_options
() → FrameExtractionOptions¶ Returns frame extraction options.
-
need_raw_log_energy
() → bool¶ Whether raw log energy is added to features.
-
-
class
kaldi.feat.fbank.
FbankOptions
¶ Options for computing filterbank features.
-
energy_floor
¶ Absolute energy floor used in filterbank computation (default=0.0)
-
frame_opts
¶ Options for frame extraction
-
htk_compat
¶ Whether to put energy last (default=False)
-
mel_opts
¶ Options for Mel banks (default #mel-banks is 23)
-
raw_energy
¶ Whether to compute energy before preemphasis and windowing (default=True)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
use_energy
¶ Whether to add an extra energy dimension to filterbank output (default=False)
-
use_log_fbank
¶ Whether to compute log-filterbank (default) or linear-filterbank
-
use_power
¶ Whether to use power (default) or magnitude
-
kaldi.feat.functions¶
Functions
compute_deltas |
Computes delta features. |
compute_power_spectrum |
Converts a complex FFT to a power spectrum. |
compute_shift_deltas |
Computes shifted delta features. |
init_idft_bases |
Initializes IDFT bases. |
reverse_frames |
Reverses frames in time. |
sliding_window_cmn |
Applies sliding-window cepstral mean and/or variance normalization. |
splice_frames |
Splices feature frames. |
Classes
DeltaFeatures |
Delta features computer. |
DeltaFeaturesOptions |
Options for delta computation. |
ShiftedDeltaFeatures |
Shifted delta features computer. |
ShiftedDeltaFeaturesOptions |
Options for shifted delta computation. |
SlidingWindowCmnOptions |
Options for sliding window CMN computation. |
-
class
kaldi.feat.functions.
DeltaFeatures
(opts:DeltaFeaturesOptions)¶ Delta features computer.
This class provides a low-level function to compute delta features. The function takes as input a matrix of features and a frame index that it should compute the deltas on. It puts its output in an object of type VectorBase, of size (original-feature-dimension) * (opts.order+1). This is not the most efficient way to do the computation, but it’s state-free and thus easier to understand.
Parameters: opts (DeltaFeaturesOptions) – Options for delta computation.
-
class
kaldi.feat.functions.
DeltaFeaturesOptions
(order:int=2, window:int=2)¶ Options for delta computation.
Parameters: Note
The behavior at the edges is to replicate the first or last frame.
-
order
¶ Delta computation order (default=2)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
window
¶ Delta computation window (default=2)
-
-
class
kaldi.feat.functions.
ShiftedDeltaFeatures
(opts:ShiftedDeltaFeaturesOptions)¶ Shifted delta features computer.
This class provides a low-level function to compute shifted delta cesptra (SDC). The function takes as input a matrix of features and a frame index that it should compute the deltas on. It puts its output in an object of type VectorBase, of size original-feature-dimension + (1 * num_blocks).
Parameters: opts (ShiftedDeltaFeaturesOptions) – Options for shifted delta computation.
-
class
kaldi.feat.functions.
ShiftedDeltaFeaturesOptions
¶ Options for shifted delta computation.
-
block_shift
¶ Distance between each block (default=3)
-
num_blocks
¶ Number of blocks in advance of each frame to be concatenated (default=7)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
window
¶ Size of time delay and advance (default=1)
-
-
class
kaldi.feat.functions.
SlidingWindowCmnOptions
¶ Options for sliding window CMN computation.
-
center
¶ Whether to center the window on the current frame (default=False)
-
check
()¶ Checks if option values are valid.
- Throws:
- RuntimeError: If option values are not valid.
-
cmn_window
¶ Window size for average CMN computation (default=600)
-
max_warnings
¶ Maximum watning to report per utterance (default=5)
-
min_window
¶ Minimum CMN window used at start of decoding (default=100)
-
normalize_variance
¶ Whether to normalize variance to one (default=False)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
-
kaldi.feat.functions.
compute_deltas
(delta_opts:DeltaFeaturesOptions, input_features:MatrixBase) → Matrix¶ Computes delta features.
Parameters: - delta_opts (DeltaFeaturesOptions) – Options for delta computation.
- input_features (Matrix) – Input feature matrix.
Returns: A matrix representing output delta features.
Return type: Note
This convenience function computes delta features for an entire feature matrix. If you want to deal with features coming in frame by frame you can use the DeltaFeatures class.
-
kaldi.feat.functions.
compute_power_spectrum
(complex_fft:VectorBase)¶ Converts a complex FFT to a power spectrum.
If the input complex FFT is a vector of size n (representing half the complex FFT of a real signal of size n), this function overwrites the first (n/2) + 1 elements of it with the energies of the FFT bins from zero to the Nyquist frequency. Contents of the remaining (n/2) - 1 elements are undefined at output.
Parameters: complex_fft (Vector) – Complex FFT to be converted to a power spectrum.
-
kaldi.feat.functions.
compute_shift_deltas
(delta_opts:ShiftedDeltaFeaturesOptions, input_features:MatrixBase) → Matrix¶ Computes shifted delta features.
Parameters: - delta_opts (ShiftedDeltaFeaturesOptions) – Options for shifted delta computation.
- input_features (Matrix) – Input feature matrix.
Returns: A matrix representing output delta features.
Return type: Note
This convenience function computes delta features for an entire feature matrix. If you want to deal with features coming in frame by frame you can use the ShiftedDeltaFeatures class.
-
kaldi.feat.functions.
init_idft_bases
(n_bases:int, dimension:int) → Matrix¶ Initializes IDFT bases.
Parameters: Returns: A matrix representing IDFT bases.
Return type:
-
kaldi.feat.functions.
reverse_frames
(input_features:MatrixBase) → Matrix¶ Reverses frames in time.
This function is used for backwards decoding.
Parameters: input_features (Matrix) – Input feature matrix. Returns: A matrix representing output features. Return type: Matrix
-
kaldi.feat.functions.
sliding_window_cmn
(opts:SlidingWindowCmnOptions, input:MatrixBase, output:MatrixBase)¶ Applies sliding-window cepstral mean and/or variance normalization.
Input and output feature matrices must have the same dimension.
-
kaldi.feat.functions.
splice_frames
(input_features:MatrixBase, left_context:int, right_context:int) → Matrix¶ Splices feature frames.
This function is normally used together with LDA. It splices frames together to make a window.
Parameters: Returns: A matrix representing output features.
Return type: - Throws:
- RuntimeError: If input feature matrix is empty.
Note
At the start and end of an utterance, it duplicates the first and last frames. Number of left and right context frames must be non-negative.
kaldi.feat.mel¶
Functions
compute_lifter_coeffs |
Computes liftering coefficients. |
compute_lpc |
Computes LP coefficients from autocorrelation coefficients. |
get_equal_loudness_vector |
Computes equal loudness vector. |
Classes
MelBanks |
Mel filterbanks. |
MelBanksOptions |
Options for Mel filterbanks. |
-
class
kaldi.feat.mel.
MelBanks
¶ Mel filterbanks.
Parameters: - opts (MelBanksOptions) – Options for Mel filterbanks.
- frame_opts (FrameExtractionOptions) – Options for frame extraction.
- vtln_warp_factor (float) – VTLN warp factor.
-
compute
(fft_energies:VectorBase, mel_energies_out:VectorBase)¶ Computes Mel energies.
Parameters:
-
from_other
(other:MelBanks) → MelBanks¶ Creates a new MelBanks object from another.
-
get_center_freqs
() → Vector¶ Returns vector of center frequencies of each Mel bin.
-
inverse_mel_scale
(mel_freq:float) → float¶ Computes inverse Mel scale for the given Mel-frequency.
-
mel_scale
(freq:float) → float¶ Computes Mel scale for the given frequency.
-
num_bins
() → int¶ Returns number of Mel bins.
-
vtln_warp_freq
(vtln_low_cutoff:float, vtln_high_cutoff:float, low_freq:float, high_freq:float, vtln_warp_factor:float, freq:float) → float¶ Computes VTLN warp frequency
-
vtln_warp_mel_freq
(vtln_low_cutoff:float, vtln_high_cutoff:float, low_freq:float, high_freq:float, vtln_warp_factor:float, mel_freq:float) → float¶ Computes VTLN warp Mel-frequency
-
class
kaldi.feat.mel.
MelBanksOptions
¶ Options for Mel filterbanks.
Parameters: num_bins (int) – Number of triangular Mel-frequency bins (default=25). -
debug_mel
¶ Print out debugging information for Mel bin computation (default=False)
-
high_freq
¶ High cutoff frequency for Mel bins, if 0 no cutoff, if < 0 offset from Nyquist (default=0).
-
htk_mode
¶ Enables more exact compatibility with HTK (default=False)
-
low_freq
¶ Low cutoff frequency for Mel bins (default=20)
-
num_bins
¶ Number of triangular Mel-frequency bins (default=25)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
vtln_high
¶ High inflection point in piecewise linear VTLN warping function, if < 0 offset from high_freq (default=-500)
-
vtln_low
¶ Low inflection point in piecewise linear VTLN warping function (default=100)
-
-
kaldi.feat.mel.
compute_lifter_coeffs
(Q:float, coeffs:VectorBase)¶ Computes liftering coefficients.
Coefficients are numbered slightly differently from HTK. The zeroth index is C0, which is not affected.
Parameters:
-
kaldi.feat.mel.
compute_lpc
(autocorr_in:VectorBase, lpc_out:Vector) → float¶ Computes LP coefficients from autocorrelation coefficients.
Parameters: Returns: The log energy of residual.
Return type:
kaldi.feat.mfcc¶
Classes
Mfcc |
MFCC computer. |
MfccComputer |
MFCC computer. |
MfccOptions |
Options for computing MFCC features. |
-
class
kaldi.feat.mfcc.
Mfcc
¶ MFCC computer.
Parameters: opts (MfccOptions) – Options for computing MFCC features. -
compute
(wave:VectorBase, vtln_warp:float) → Matrix¶ Computes the MFCC features from input waveform.
This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
compute_features
(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix¶ Computes the MFCC features from input waveform.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
dim
() → int¶ Returns the feature dimension.
-
from_other
(other:Mfcc) → Mfcc¶ Constructs a new Mfcc object from another.
-
-
class
kaldi.feat.mfcc.
MfccComputer
¶ MFCC computer.
This is the low-level interface for computing MFCC features.
Parameters: opts (MfccOptions) – Options for computing MFCC features. -
compute
(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)¶ Computes one feature frame from one signal frame.
Parameters: - signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
need_raw_log_energy()
returns False. - vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VLTN, such as spectrogram features.
- signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
- feature (Vector) – Output frame of features.
- signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
-
dim
() → int¶ Returns feature dimension.
-
from_other
(other:MfccComputer) → MfccComputer¶ Constructs a new MfccComputer object from another.
-
get_frame_options
() → FrameExtractionOptions¶ Returns frame extraction options.
-
need_raw_log_energy
() → bool¶ Whether raw log energy is added to features.
-
-
class
kaldi.feat.mfcc.
MfccOptions
¶ Options for computing MFCC features.
-
cepstral_lifter
¶ Constant controlling the scaling of MFCCs (default=22)
-
energy_floor
¶ Absolute energy floor used in MFCC computation (default=0.0)
-
frame_opts
¶ Options for frame extraction
-
htk_compat
¶ Whether to put energy (or sqrt(2)*C0) last (default=False)
-
mel_opts
¶ Options for Mel banks (default #mel-banks is 23)
-
num_ceps
¶ Number of cepstral coefficients including C0 (default=13)
-
raw_energy
¶ Whether to compute energy before preemphasis and windowing (default=True)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
use_energy
¶ Whether to use energy (not C0) in MFCC computation (default=True)
-
kaldi.feat.online¶
Classes
OnlineAppendFeature |
Online feature concatenation. |
OnlineCacheFeature |
Online feature caching. |
OnlineCmvn |
Online CMVN features. |
OnlineCmvnOptions |
Options for online CMVN. |
OnlineCmvnState |
Online CMVN state. |
OnlineDeltaFeature |
Online delta features. |
OnlineFbank |
Online filterbank features extractor. |
OnlineMatrixFeature |
Online matrix features. |
OnlineMfcc |
Online MFCC features extractor. |
OnlinePlp |
Online PLP features extractor. |
OnlineSpliceFrames |
Online feature splicing. |
OnlineSpliceOptions |
Options for online feature splicing. |
OnlineTransform |
Online feature transformation. |
-
class
kaldi.feat.online.
OnlineAppendFeature
¶ Online feature concatenation.
Parameters: - src1 (OnlineFeatureInterface) – First source of online features.
- src2 (OnlineFeatureInterface) – Second source of online features.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
class
kaldi.feat.online.
OnlineCacheFeature
¶ Online feature caching.
Parameters: src (OnlineFeatureInterface) – Source online features. -
clear_cache
()¶ Clears feature cache.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
-
class
kaldi.feat.online.
OnlineCmvn
¶ Online CMVN features.
Parameters: - opts (OnlineCmvnOptions) – Options for online CMVN features.
- cmvn_state (OnlineCmvnState) – Online CMVN state.
- src (OnlineFeatureInterface) – Source online features.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
freeze
(cur_frame:int)¶ Freezes online CMVN state
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
get_state
(cur_frame:int) → OnlineCmvnState¶ Returns the online CMVN state for current frame
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
set_state
(cmvn_state:OnlineCmvnState)¶ Sets online CMVN state
-
without_state
(opts:OnlineCmvnOptions, src:OnlineFeatureInterface) → OnlineCmvn¶ Creates a new OnlineCmvn object
-
class
kaldi.feat.online.
OnlineCmvnOptions
¶ Options for online CMVN.
-
check
()¶ Checks if options are valid.
- Throws:
- RuntimeError: If options are not valid.
-
cmn_window
¶ Number of frames of sliding context for CMN (default=600)
-
global_frames
¶ Number of global frames to use in CMN for first utterance of a speaker (default=200)
-
modulus
¶ Relates to how CMVN is computed internally (deafult=20)
-
normalize_mean
¶ If True, do cepstral mean normalization (default=True)
-
normalize_variance
¶ If True, do cepstral mean and variance normalization (default=False)
-
register
(po:ParseOptions)¶ Registers options with a command-line option parser.
Parameters: po (ParseOptions) – Command-line option parser.
-
ring_buffer_size
¶ Size of ring buffer used for caching CMVN stats (deafult=20)
-
skip_dims
¶ Colon seperated list of dimensions to skip (default=”“)
-
speaker_frames
¶ Number of frames from this speaker to use in CMN (deafult=6000)
-
-
class
kaldi.feat.online.
OnlineCmvnState
¶ Online CMVN state.
-
from_other
(other:OnlineCmvnState) → OnlineCmvnState¶ Constructs a new OnlineCmvnState object from another.
-
from_stats
(global_stats:DoubleMatrix) → OnlineCmvnState¶ Constructs a new OnlineCmvnState object from global stats matrix.
-
frozen_state
¶ If nonempty, CMVN stats representing the frozen state.
-
global_cmvn_stats
¶ Global CMVN stats
-
read
(is:istream, binary:bool)¶ Reads online CMVN stats from input stream.
-
speaker_cmvn_stats
¶ Total CMVN stats for this speaker
-
write
(os:ostream, binary:bool)¶ Writes online CMVN stats to output stream.
-
-
class
kaldi.feat.online.
OnlineDeltaFeature
¶ Online delta features.
Parameters: - opts (DeltaFeaturesOptions) – Options for delta features.
- src (OnlineFeatureInterface) – Source online features.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
class
kaldi.feat.online.
OnlineFbank
¶ Online filterbank features extractor.
Parameters: opts (FbankOptions) – Options for extracting filterbank features. -
accept_waveform
(sampling_rate:float, waveform:VectorBase)¶ Accepts input waveform for feature extraction.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
input_finished
()¶ Marks input as finished.
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
-
class
kaldi.feat.online.
OnlineMatrixFeature
¶ Online matrix features.
Parameters: mat (Matrix) – Source feature matrix. -
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
-
class
kaldi.feat.online.
OnlineMfcc
¶ Online MFCC features extractor.
Parameters: opts (MfccOptions) – Options for extracting MFCC features. -
accept_waveform
(sampling_rate:float, waveform:VectorBase)¶ Accepts input waveform for feature extraction.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
input_finished
()¶ Marks input as finished.
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
-
class
kaldi.feat.online.
OnlinePlp
¶ Online PLP features extractor.
Parameters: opts (PlpOptions) – Options for extracting PLP features. -
accept_waveform
(sampling_rate:float, waveform:VectorBase)¶ Accepts input waveform for feature extraction.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
input_finished
()¶ Marks input as finished.
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
-
class
kaldi.feat.online.
OnlineSpliceFrames
¶ Online feature splicing.
Parameters: - opts (OnlineSpliceOptions) – Options for online feature splicing.
- src (OnlineFeatureInterface) – Source online features.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
class
kaldi.feat.online.
OnlineSpliceOptions
¶ Options for online feature splicing.
-
left_context
¶ Left-context for frame splicing prior to LDA (default=4)
-
register
(po:ParseOptions)¶ Registers options with a command-line option parser.
Parameters: po (ParseOptions) – Command-line option parser.
-
right_context
¶ Right-context for frame splicing prior to LDA (default=4)
-
-
class
kaldi.feat.online.
OnlineTransform
¶ Online feature transformation.
Parameters: - transform (Matrix) – Feature transformation matrix.
- src (OnlineFeatureInterface) – Source online features.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
kaldi.feat.pitch¶
Functions
compute_and_process_kaldi_pitch |
Computes and postprocesses pitch features. |
compute_kaldi_pitch |
Computes pitch features. |
process_pitch |
Postprocesses pitch features. |
Classes
OnlinePitchFeature |
Online pitch feature extractor. |
OnlineProcessPitch |
Online pitch postprocessor. |
PitchExtractionOptions |
Options for pitch extraction. |
ProcessPitchOptions |
Options for pitch postprocessing. |
-
class
kaldi.feat.pitch.
OnlinePitchFeature
¶ Online pitch feature extractor.
Parameters: opts (PitchExtractionOptions) – Options for pitch extraction. -
accept_waveform
(sampling_rate:float, waveform:VectorBase)¶ Accepts input waveform for feature extraction.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
input_finished
()¶ Marks input as finished.
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
-
class
kaldi.feat.pitch.
OnlineProcessPitch
¶ Online pitch postprocessor.
Parameters: - opts (ProcessPitchOptions) – Options for pitch postprocessing.
- src (OnlineFeatureInterface) – Source pitch features.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames_ready
() → int¶ Returns number of frames ready
-
class
kaldi.feat.pitch.
PitchExtractionOptions
¶ Options for pitch extraction.
-
delta_pitch
¶ Smallest relative change in pitch that the algorithm measures (deafult=0.005)
-
frame_length_ms
¶ Frame length in milliseconds (default=25)
-
frame_shift_ms
¶ Frame shift in milliseconds (default=10)
-
frames_per_chunk
¶ Used for compatibility with online decoding (default=0)
-
lowpass_cutoff
¶ Cutoff frequency for low pass filter in Hz (default=1000)
-
lowpass_filter_width
¶ Width of low pass filter, larger gives sharper filter (default=1)
-
max_f0
¶ max F0 to search for in Hz (default=400)
-
max_frames_latency
¶ Max number of frames of latency allowed (default=0)
-
min_f0
¶ min F0 to search for in Hz (default=50)
-
nccf_ballast
¶ Increasing this factor reduces NCCF fir quiet frames (default=7000)
-
nccf_ballast_online
¶ Useful for debug. Affects how NCCF ballast is computed (default=False)
-
nccf_window_shift
¶ NCCF window shift
-
nccf_window_size
¶ NCCF window size
-
penalty_factor
¶ Cost factor for F0 change (default=0.1)
-
preemph_coeff
¶ Coefficient for use in signal preemphasis (default=0.0)
-
recompute_frame
¶ Used for compatibility with online decoding (default=500)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
resample_freq
¶ Frequency that we downsample the signal to > 2*lowpass_cutoff (default=4000)
-
samp_freq
¶ Sample frequency, must match the waveform (default=16000)
-
simulate_first_pass_online
¶ Output features that correspond to what an online decoder would see (default=False)
-
snip_edges
¶ Whether to omit features for incomplete frames near edges (default=True)
-
soft_min_f0
¶ min F0, applied in a soft way, must not exceed min_f0 (default=10)
-
upsample_filter_width
¶ Width of filter used in upsampling NCCF (default=5)
-
-
class
kaldi.feat.pitch.
ProcessPitchOptions
¶ Options for pitch postprocessing.
-
add_delta_pitch
¶ If true, time derivative of log-pitch is added to output features (default=True)
-
add_normalized_log_pitch
¶ If true, the normalized-log-pitch is added to output features (default=True)
-
add_pov_feature
¶ If true, the warped NCCF is added to output features (default=True)
-
add_raw_log_pitch
¶ If true, log(pitch) is added to output features (default=False)
-
delay
¶ Number of frames by which the pitch information is delayed (default=0)
-
delta_pitch_noise_stddev
¶ Standard deviation for noise we add to the delta log-pitch (default=0.005)
-
delta_pitch_scale
¶ Term to scale the final delta log-pitch feature (default=10.0)
-
delta_window
¶ Number of frames on each side of central frame, to use for delta window (default=2)
-
normalization_left_context
¶ Left-context (in frames) for moving window normalization (default=275)
-
normalization_right_context
¶ Right-context (in frames) for moving window normalization (default=75)
-
pitch_scale
¶ Scaling factor for the final normalized log-pitch value (default=2.0)
-
pov_offset
¶ Offset for final POV feature, useful in online decoding (default=0.0)
-
pov_scale
¶ Scaling factor for final POV (probability of voicing) feature (default=2.0)
-
register
(po:ParseOptions)¶ Registers options with a command-line option parser.
Parameters: po (ParseOptions) – Command-line option parser.
-
-
kaldi.feat.pitch.
compute_and_process_kaldi_pitch
(pitch_opts:PitchExtractionOptions, process_opts:ProcessPitchOptions, wave:VectorBase) → Matrix¶ Computes and postprocesses pitch features.
Parameters: - pitch_opts (PitchExtractionOptions) – Options for pitch extraction.
- process_opts (ProcessPitchOptions) – Options for pitch postprocessing.
- wave (Vector) – Input waveform.
Returns: Postprocessed pitch features.
Return type:
-
kaldi.feat.pitch.
compute_kaldi_pitch
(opts:PitchExtractionOptions, wave:VectorBase) → Matrix¶ Computes pitch features.
Parameters: - opts (PitchExtractionOptions) – Options for pitch extraction.
- wave (Vector) – Input waveform.
Returns: Pitch features.
Return type:
-
kaldi.feat.pitch.
process_pitch
(opts:ProcessPitchOptions, input:MatrixBase) → Matrix¶ Postprocesses pitch features.
Parameters: - opts (ProcessPitchOptions) – Options for pitch postprocessing.
- input (Matrix) – Input pitch features.
Returns: Postprocessed pitch features.
Return type:
kaldi.feat.plp¶
Classes
Plp |
PLP computer. |
PlpComputer |
PLP computer. |
PlpOptions |
Options for computing PLP features. |
-
class
kaldi.feat.plp.
Plp
¶ PLP computer.
Parameters: opts (PlpOptions) – Options for computing PLP features. -
compute
(wave:VectorBase, vtln_warp:float) → Matrix¶ Computes the PLP features from input waveform.
This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
compute_features
(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix¶ Computes the PLP features from input waveform.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
dim
() → int¶ Returns the feature dimension.
-
from_other
(other:Plp) → Plp¶ Constructs a new Plp object from another.
-
-
class
kaldi.feat.plp.
PlpComputer
¶ PLP computer.
This is the low-level interface for computing PLP features.
Parameters: opts (PlpOptions) – Options for computing PLP features. -
compute
(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)¶ Computes one feature frame from one signal frame.
Parameters: - signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
need_raw_log_energy()
returns False. - vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VTLN, such as spectrogram features.
- signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
- feature (Vector) – Output frame of features.
- signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
-
dim
() → int¶ Returns feature dimension.
-
from_other
(other:PlpComputer) → PlpComputer¶ Constructs a new PlpComputer object from another.
-
get_frame_options
() → FrameExtractionOptions¶ Returns frame extraction options.
-
need_raw_log_energy
() → bool¶ Whether raw log energy is added to features.
-
-
class
kaldi.feat.plp.
PlpOptions
¶ Options for computing PLP features.
-
cepstral_lifter
¶ Constant controlling the scaling of PLPs (default=22)
-
cepstral_scale
¶ Scaling constant in PLP computation (default=1.0)
-
compress_factor
¶ Compression factor in PLP computation (default=0.33333)
-
energy_floor
¶ Absolute energy floor used in PLP computation (default=0.0)
-
frame_opts
¶ Options for frame extraction
-
htk_compat
¶ Whether to put energy (or C0) last (default=False)
-
lpc_order
¶ Order of LPC analysis (default=12)
-
mel_opts
¶ Options for Mel banks (default #mel-banks is 23)
-
num_ceps
¶ Number of cepstral coefficients including C0 (default=13)
-
raw_energy
¶ Whether to compute energy before preemphasis and windowing (default=True)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
use_energy
¶ Whether to use energy (not C0) in MFCC computation (default=True)
-
kaldi.feat.signal¶
Functions
convolve_signals |
Does simple non-FFT-based convolution of two signals. |
downsample_wave_form |
Downsamples a waveform. |
fft_based_block_convolve_signals |
Does FFT-based block convolution of two signals using overlap-add method. |
fft_based_convolve_signals |
Does FFT-based convolution of two signals. |
Classes
ArbitraryResample |
Arbitrary resampler. |
LinearResample |
Linear resampler. |
-
class
kaldi.feat.signal.
ArbitraryResample
¶ Arbitrary resampler.
This allows you to resample a signal (assumed zero outside the sample region, not periodic) at arbitrary specified time values, which don’t have to be linearly spaced.
The low-pass filter cutoff “filter_cutoff_hz” should be less than half the sample rate; “num_zeros” should probably be at least two preferably more; higher numbers give sharper filters but will be less efficient.
-
num_samples_in
¶ Number of samples at input
-
num_samples_out
¶ Number of samples at output
-
resample
(input:MatrixBase, output:MatrixBase)¶ Resamples input signal.
input.num_rows
andoutput.num_rows
should be equal and nonzero.input.num_cols
should be equal tonum_samples_in
.output.num_cols
should be equal tonum_samples_out
.Parameters: - input (MatrixBase) – Input signal.
- output (MatrixBase) – Output signal.
-
resample_vector
(input:VectorBase, output:VectorBase)¶ Resamples input signal.
This version processes just one vector.
-
-
class
kaldi.feat.signal.
LinearResample
¶ Linear resampler.
LinearResample is a special case of ArbitraryResample, where we want to resample a signal at linearly spaced intervals (this means we want to upsample or downsample the signal). It is more efficient than ArbitraryResample because we can construct it just once.
We require that the input and output sampling rate be specified as integers, as this is an easy way to specify that their ratio be rational.
-
resample
(input:VectorBase, flush:bool) → Vector¶ Resamples input signal.
If you call it with flush == True and you have never called it with flush == False, it just resamples the input signal.
You can also use this to process a signal a piece at a time by calling it with flush == False except for the last piece.
If you call it with flush == false, it won’t output the last few samples but will remember them, so that if you later give it a second piece of the input signal it can process it correctly.
If your most recent call to the object was with flush == false, it will have internal state; you can remove this by calling
reset()
.Empty input is acceptable.
Parameters: - input (MatrixBase) – Input signal.
- flush (bool) – Whether to flush output.
Returns: Output signal.
Return type:
-
reset
()¶ Resets the state of the resampler.
-
-
kaldi.feat.signal.
convolve_signals
(filter:Vector, signal:Vector)¶ Does simple non-FFT-based convolution of two signals.
It is suggested to use the FFT-based convolution function which is more efficient.
Parameters:
-
kaldi.feat.signal.
downsample_wave_form
(orig_freq:float, wave:VectorBase, new_freq:float) → Vector¶ Downsamples a waveform.
This is a convenience wrapper for
LinearResample
.
-
kaldi.feat.signal.
fft_based_block_convolve_signals
(filter:Vector, signal:Vector)¶ Does FFT-based block convolution of two signals using overlap-add method.
This is an efficient way of computing the discrete convolution of a long signal with a finite impulse response filter.
Parameters:
-
kaldi.feat.signal.
fft_based_convolve_signals
(filter:Vector, signal:Vector)¶ Does FFT-based convolution of two signals.
This is less efficient than
fft_based_block_convolve_signals()
as it processes the entire signal with a single FFT.Parameters:
kaldi.feat.spectrogram¶
Classes
Spectrogram |
Spectrogram computer. |
SpectrogramComputer |
Spectrogram computer. |
SpectrogramOptions |
Options for computing spectrogram features. |
-
class
kaldi.feat.spectrogram.
Spectrogram
¶ Spectrogram computer.
Parameters: opts (SpectrogramOptions) – Options for computing spectrogram features. -
compute
(wave:VectorBase, vtln_warp:float) → Matrix¶ Computes the spectrogram features from input waveform.
This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
compute_features
(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix¶ Computes the spectrogram features from input waveform.
Parameters: Returns: The matrix of features, where the row-index is the frame index.
-
dim
() → int¶ Returns the feature dimension.
-
from_other
(other:Spectrogram) → Spectrogram¶ Constructs a new Spectrogram object from another.
-
-
class
kaldi.feat.spectrogram.
SpectrogramComputer
¶ Spectrogram computer.
This is the low-level interface for computing spectrogram features.
Parameters: opts (FbankOptions) – Options for computing spectrogram features. -
compute
(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)¶ Computes one feature frame from one signal frame.
Parameters: - signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
need_raw_log_energy()
returns False. - vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VLTN, such as spectrogram features.
- signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
- feature (Vector) – Output frame of features.
- signal_log_energy (float) – The log-energy of the signal frame prior
to windowing and pre-emphasis, or log(min-positive-float),
whichever is greater. Ignored if
-
dim
() → int¶ Returns feature dimension.
-
from_other
(other:SpectrogramComputer) → SpectrogramComputer¶ Constructs a new SpectrogramComputer object from another.
-
get_frame_options
() → FrameExtractionOptions¶ Returns frame extraction options.
-
need_raw_log_energy
() → bool¶ Whether raw log energy is added to features.
-
-
class
kaldi.feat.spectrogram.
SpectrogramOptions
¶ Options for computing spectrogram features.
-
energy_floor
¶ Absolute energy floor used in spectrogram computation (default=0.0)
-
frame_opts
¶ Options for frame extraction
-
raw_energy
¶ Whether to compute energy before preemphasis and windowing (default=True)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
kaldi.feat.wave¶
-
kaldi.feat.wave.
WAVE_SAMPLE_MAX
= 32768.0¶
Classes
WaveData |
Wave file data. |
WaveInfo |
Wave file header information. |
-
class
kaldi.feat.wave.
WaveData
¶ Wave file data.
-
clear
()¶ Clears the contents.
-
copy_from
(other:WaveData)¶ Copies the contents of another WaveData object to this one.
-
data
() → Matrix¶ Returns wave data matrix.
Data is returned as a matrix because there may be multiple channels. If wave file is mono, the returned matrix will have just one row.
Returns: the wave data matrix. Return type: Matrix
-
duration
¶ Approximate duration (in seconds).
-
from_data
(samp_freq:float, data:MatrixBase) → WaveData¶ Creates a new WaveData object from a waveform matrix.
-
read
(is:istream)¶ Reads wave file from input stream.
Parameters: is (istream) – Input stream. It should be opened in binary mode. - Throws:
- RuntimeError: on error.
-
samp_freq
¶ Sample frequency (in Hz).
-
swap
(other:WaveData)¶ Swaps the contents with another WaveData object.
-
-
class
kaldi.feat.wave.
WaveInfo
¶ Wave file header information.
-
block_align
¶ Number of data bytes per sample.
-
data_bytes
¶ Number of data bytes. Invalid if
is_streamed
is True.
-
duration
¶ Approximate duration (in seconds). Invalid if
is_streamed
is True.
-
is_streamed
¶ Whether stream size is unknown.
-
num_channels
¶ Number of channels.
-
read
(is:istream)¶ Reads wave file header from input stream.
After header is successfully read, input stream will be positioned at the beginning of the wave data.
Parameters: is (istream) – Input stream. It should be opened in binary mode. - Throws:
- RuntimeError: on error.
-
reverse_bytes
¶ Whether file byte order is different from machine byte order.
-
samp_freq
¶ Sample frequency (in Hz).
-
sample_count
¶ Number of samples in stream. Invalid if
is_streamed
is True.
-
kaldi.feat.window¶
Functions
dither |
Dithers waveform. |
extract_window |
Extracts a windowed frame of waveform. |
first_sample_of_frame |
Computes the index of the first sample of a given frame. |
num_frames |
Computes the number of frames that we can extact from a waveform. |
preemphasize |
Preemphasizes waveform. |
process_window |
Does all the windowing steps after extracting the windowed signal. |
Classes
FeatureWindowFunction |
Windowing function. |
FrameExtractionOptions |
Options for extracting frames. |
-
class
kaldi.feat.window.
FeatureWindowFunction
¶ Windowing function.
-
from_options
(opts:FrameExtractionOptions) → FeatureWindowFunction¶ Create a windowing function from a
FrameExtractionOptions
object
-
from_other
(other:FeatureWindowFunction) → FeatureWindowFunction¶ Create a windowing function from another
-
window
¶ Window
-
-
class
kaldi.feat.window.
FrameExtractionOptions
¶ Options for extracting frames.
-
allow_downsample
¶ Whether input waveform can have a higher sampling frequency than specified (default=False)
-
blackman_coeff
¶ Constant coefficient for generalized Blackman window (default=0.42)
-
dither
¶ Dithering constant, 0.0 means no dithering (default=1.0)
-
frame_length_ms
¶ Frame length in milliseconds (default=25)
-
frame_shift_ms
¶ Frame shift in milliseconds (default=10)
-
padded_window_size
() → int¶ Returns padded window size in terms of number of samples
-
preemph_coeff
¶ Coefficient for use in signal preemphasis (default=0.97)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
remove_dc_offset
¶ Subtract mean from waveform on each frame (default=True)
-
round_to_power_of_two
¶ Round window size to power of two by zero-paddinf FFT input (default=True)
-
samp_freq
¶ Sample frequency (default=16000)
-
snip_edges
¶ Whether to output only frames that fit in the waveform (default=True)
-
window_shift
() → int¶ Returns window shift in terms of number of samples
-
window_size
() → int¶ Returns window size in terms of number of samples
-
window_type
¶ Type of window, one of hamming, hanning, povey (default), rectangular, blackman
-
-
kaldi.feat.window.
dither
(waveform:VectorBase, dither_value:float)¶ Dithers waveform.
Parameters:
-
kaldi.feat.window.
extract_window
(sample_offset:int, wave:VectorBase, f:int, opts:FrameExtractionOptions, window_function:FeatureWindowFunction, window:Vector)¶ Extracts a windowed frame of waveform.
It does everything done by
process_window()
.Parameters: - sample_offset (int) – If ‘wave’ is not the entire waveform, but part of it to the left has been discarded, then the number of samples prior to ‘wave’ that we have already discarded. Set this to zero if you are processing the entire waveform in one piece, or if you get ‘no matching function’ compilation errors when updating the code.
- wave (Vector) – The input waveform
- f (int) – The frame index to be extracted.
- opts (FrameExtractionOptions) – Options for frame extraction.
- window_function (FeatureWindowFunction) – The windowing function. It should have been initialized using ‘opts’.
- window (Vector) – The output windowed waveform (possibly-padded). It will be resized as needed.
-
kaldi.feat.window.
first_sample_of_frame
(frame:int, opts:FrameExtractionOptions) → int¶ Computes the index of the first sample of a given frame.
Parameters: - frame (int) – The frame index.
- opts (FrameExtractionOptions) – Options for frame extraction.
Returns: The index of the first sample.
Return type:
-
kaldi.feat.window.
num_frames
(num_samples:int, opts:FrameExtractionOptions, flush:bool=default) → int¶ Computes the number of frames that we can extact from a waveform.
Assumes that the waveform has the sampling rate specified in frame extraction options.
Parameters: - num_samples (int) – The number of samples in the waveform.
- opts (FrameExtractionOptions) – Options for frame extraction.
- flush (bool) – True if we are asserting that this number of samples is ‘all there is’, False if we expecting more data to possibly come in. This only makes a difference to the answer if opts.snips_edges == False. For offline feature extraction you always want flush == True. In an online-decoding context, once you know (or decide) that no more data is coming in, you’d call it with flush == True at the end to flush out any remaining data (default=True).
Returns: Number of frames that can be extracted.
Return type:
-
kaldi.feat.window.
preemphasize
(waveform:VectorBase, preemph_coeff:float)¶ Preemphasizes waveform.
Parameters:
-
kaldi.feat.window.
process_window
(opts:FrameExtractionOptions, window_function:FeatureWindowFunction, window:VectorBase)¶ Does all the windowing steps after extracting the windowed signal.
Depending on the configuration, it does dithering, dc offset removal, preemphasis, and multiplication by the windowing function.
Parameters: - opts (FrameExtractionOptions) – Options for frame extraction.
- window_function (FeatureWindowFunction) – The windowing function. It should have been initialized using ‘opts’.
- window (Vector) – The signal window to be processed. Its size should match ‘opts.padded_window_size()’.