kaldi.feat

kaldi.feat.fbank

Classes

Fbank Filterbank computer.
FbankComputer Fiterbank computer.
FbankOptions Options for computing filterbank features.
class kaldi.feat.fbank.Fbank

Filterbank computer.

Parameters:opts (FbankOptions) – Options for computing filterbank features.
compute(wave:VectorBase, vtln_warp:float) → Matrix

Computes the filterbank features from input waveform.

This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.

Parameters:
  • wave (Vector) – The input waveform
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

compute_features(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix

Computes the filterbank features from input waveform.

Parameters:
  • wave (Vector) – The input waveform
  • sample_freq (float) – The sampling frequency with which wave is sampled. If sample_freq is higher than the frequency specified in the config, the waveform is downsampled.
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

dim() → int

Returns the feature dimension.

from_other(other:Fbank) → Fbank

Constructs a new Fbank object from another.

class kaldi.feat.fbank.FbankComputer

Fiterbank computer.

This is the low-level interface for computing filterbank features.

Parameters:opts (FbankOptions) – Options for computing filterbank features.
compute(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)

Computes one feature frame from one signal frame.

Parameters:
  • signal_log_energy (float) – The log-energy of the signal frame prior to windowing and pre-emphasis, or log(min-positive-float), whichever is greater. Ignored if need_raw_log_energy() returns False.
  • vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VLTN, such as spectrogram features.
  • signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
  • feature (Vector) – Output frame of features.
dim() → int

Returns feature dimension.

from_other(other:FbankComputer) → FbankComputer

Constructs a new FbankComputer object from another.

get_frame_options() → FrameExtractionOptions

Returns frame extraction options.

need_raw_log_energy() → bool

Whether raw log energy is added to features.

class kaldi.feat.fbank.FbankOptions

Options for computing filterbank features.

energy_floor

Absolute energy floor used in filterbank computation (default=0.0)

frame_opts

Options for frame extraction

htk_compat

Whether to put energy last (default=False)

mel_opts

Options for Mel banks (default #mel-banks is 23)

raw_energy

Whether to compute energy before preemphasis and windowing (default=True)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
use_energy

Whether to add an extra energy dimension to filterbank output (default=False)

use_log_fbank

Whether to compute log-filterbank (default) or linear-filterbank

use_power

Whether to use power (default) or magnitude

kaldi.feat.functions

Functions

compute_deltas Computes delta features.
compute_power_spectrum Converts a complex FFT to a power spectrum.
compute_shift_deltas Computes shifted delta features.
init_idft_bases Initializes IDFT bases.
reverse_frames Reverses frames in time.
sliding_window_cmn Applies sliding-window cepstral mean and/or variance normalization.
splice_frames Splices feature frames.

Classes

DeltaFeatures Delta features computer.
DeltaFeaturesOptions Options for delta computation.
ShiftedDeltaFeatures Shifted delta features computer.
ShiftedDeltaFeaturesOptions Options for shifted delta computation.
SlidingWindowCmnOptions Options for sliding window CMN computation.
class kaldi.feat.functions.DeltaFeatures(opts:DeltaFeaturesOptions)

Delta features computer.

This class provides a low-level function to compute delta features. The function takes as input a matrix of features and a frame index that it should compute the deltas on. It puts its output in an object of type VectorBase, of size (original-feature-dimension) * (opts.order+1). This is not the most efficient way to do the computation, but it’s state-free and thus easier to understand.

Parameters:opts (DeltaFeaturesOptions) – Options for delta computation.
process(input_feats:MatrixBase, frame:int, output_frame:VectorBase)

Computes delta features for given frame.

Parameters:
  • input_feats (Matrix) – Input feature matrix.
  • frame (int) – Frame index.
  • output_frame (Vector) – Output vector representing delta features.
class kaldi.feat.functions.DeltaFeaturesOptions(order:int=2, window:int=2)

Options for delta computation.

Parameters:
  • order (int) – Delta computation order (default=2).
  • window (int) – Delta computation window (default=2). Actual window size is 2 * window + 1.

Note

The behavior at the edges is to replicate the first or last frame.

order

Delta computation order (default=2)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
window

Delta computation window (default=2)

class kaldi.feat.functions.ShiftedDeltaFeatures(opts:ShiftedDeltaFeaturesOptions)

Shifted delta features computer.

This class provides a low-level function to compute shifted delta cesptra (SDC). The function takes as input a matrix of features and a frame index that it should compute the deltas on. It puts its output in an object of type VectorBase, of size original-feature-dimension + (1 * num_blocks).

Parameters:opts (ShiftedDeltaFeaturesOptions) – Options for shifted delta computation.
process(input_feats:MatrixBase, frame:int, output_frame:SubVector)

Computes shifted delta features for given frame.

Parameters:
  • input_feats (Matrix) – Input feature matrix.
  • frame (int) – Frame index.
  • output_frame (Vector) – Output vector representing delta features.
class kaldi.feat.functions.ShiftedDeltaFeaturesOptions

Options for shifted delta computation.

block_shift

Distance between each block (default=3)

num_blocks

Number of blocks in advance of each frame to be concatenated (default=7)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
window

Size of time delay and advance (default=1)

class kaldi.feat.functions.SlidingWindowCmnOptions

Options for sliding window CMN computation.

center

Whether to center the window on the current frame (default=False)

check()

Checks if option values are valid.

Throws:
RuntimeError: If option values are not valid.
cmn_window

Window size for average CMN computation (default=600)

max_warnings

Maximum watning to report per utterance (default=5)

min_window

Minimum CMN window used at start of decoding (default=100)

normalize_variance

Whether to normalize variance to one (default=False)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
kaldi.feat.functions.compute_deltas(delta_opts:DeltaFeaturesOptions, input_features:MatrixBase) → Matrix

Computes delta features.

Parameters:
Returns:

A matrix representing output delta features.

Return type:

Matrix

Note

This convenience function computes delta features for an entire feature matrix. If you want to deal with features coming in frame by frame you can use the DeltaFeatures class.

kaldi.feat.functions.compute_power_spectrum(complex_fft:VectorBase)

Converts a complex FFT to a power spectrum.

If the input complex FFT is a vector of size n (representing half the complex FFT of a real signal of size n), this function overwrites the first (n/2) + 1 elements of it with the energies of the FFT bins from zero to the Nyquist frequency. Contents of the remaining (n/2) - 1 elements are undefined at output.

Parameters:complex_fft (Vector) – Complex FFT to be converted to a power spectrum.
kaldi.feat.functions.compute_shift_deltas(delta_opts:ShiftedDeltaFeaturesOptions, input_features:MatrixBase) → Matrix

Computes shifted delta features.

Parameters:
Returns:

A matrix representing output delta features.

Return type:

Matrix

Note

This convenience function computes delta features for an entire feature matrix. If you want to deal with features coming in frame by frame you can use the ShiftedDeltaFeatures class.

kaldi.feat.functions.init_idft_bases(n_bases:int, dimension:int) → Matrix

Initializes IDFT bases.

Parameters:
  • n_bases (int) – Number of IDFT bases.
  • dimension (int) – Dimension of each IDFT basis.
Returns:

A matrix representing IDFT bases.

Return type:

Matrix

kaldi.feat.functions.reverse_frames(input_features:MatrixBase) → Matrix

Reverses frames in time.

This function is used for backwards decoding.

Parameters:input_features (Matrix) – Input feature matrix.
Returns:A matrix representing output features.
Return type:Matrix
kaldi.feat.functions.sliding_window_cmn(opts:SlidingWindowCmnOptions, input:MatrixBase, output:MatrixBase)

Applies sliding-window cepstral mean and/or variance normalization.

Input and output feature matrices must have the same dimension.

kaldi.feat.functions.splice_frames(input_features:MatrixBase, left_context:int, right_context:int) → Matrix

Splices feature frames.

This function is normally used together with LDA. It splices frames together to make a window.

Parameters:
  • input_features (Matrix) – Input feature matrix.
  • left_context (int) – Number of left context frames.
  • right_context (int) – Number of right context frames.
Returns:

A matrix representing output features.

Return type:

Matrix

Throws:
RuntimeError: If input feature matrix is empty.

Note

At the start and end of an utterance, it duplicates the first and last frames. Number of left and right context frames must be non-negative.

kaldi.feat.mel

Functions

compute_lifter_coeffs Computes liftering coefficients.
compute_lpc Computes LP coefficients from autocorrelation coefficients.
get_equal_loudness_vector Computes equal loudness vector.

Classes

MelBanks Mel filterbanks.
MelBanksOptions Options for Mel filterbanks.
class kaldi.feat.mel.MelBanks

Mel filterbanks.

Parameters:
compute(fft_energies:VectorBase, mel_energies_out:VectorBase)

Computes Mel energies.

Parameters:
  • fft_energies (Vector) – The FFT energies (not log).
  • mel_energies_out (Vector) – Output Mel energies (not log).
from_other(other:MelBanks) → MelBanks

Creates a new MelBanks object from another.

get_center_freqs() → Vector

Returns vector of center frequencies of each Mel bin.

inverse_mel_scale(mel_freq:float) → float

Computes inverse Mel scale for the given Mel-frequency.

mel_scale(freq:float) → float

Computes Mel scale for the given frequency.

num_bins() → int

Returns number of Mel bins.

vtln_warp_freq(vtln_low_cutoff:float, vtln_high_cutoff:float, low_freq:float, high_freq:float, vtln_warp_factor:float, freq:float) → float

Computes VTLN warp frequency

vtln_warp_mel_freq(vtln_low_cutoff:float, vtln_high_cutoff:float, low_freq:float, high_freq:float, vtln_warp_factor:float, mel_freq:float) → float

Computes VTLN warp Mel-frequency

class kaldi.feat.mel.MelBanksOptions

Options for Mel filterbanks.

Parameters:num_bins (int) – Number of triangular Mel-frequency bins (default=25).
debug_mel

Print out debugging information for Mel bin computation (default=False)

high_freq

High cutoff frequency for Mel bins, if 0 no cutoff, if < 0 offset from Nyquist (default=0).

htk_mode

Enables more exact compatibility with HTK (default=False)

low_freq

Low cutoff frequency for Mel bins (default=20)

num_bins

Number of triangular Mel-frequency bins (default=25)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
vtln_high

High inflection point in piecewise linear VTLN warping function, if < 0 offset from high_freq (default=-500)

vtln_low

Low inflection point in piecewise linear VTLN warping function (default=100)

kaldi.feat.mel.compute_lifter_coeffs(Q:float, coeffs:VectorBase)

Computes liftering coefficients.

Coefficients are numbered slightly differently from HTK. The zeroth index is C0, which is not affected.

Parameters:
  • Q (float) – Liftering constant.
  • coeffs (Vector) – Output liftering coefficients.
kaldi.feat.mel.compute_lpc(autocorr_in:VectorBase, lpc_out:Vector) → float

Computes LP coefficients from autocorrelation coefficients.

Parameters:
  • autocorr_in (Vector) – Input autocorrelation coefficients.
  • lpc_out (Vector) – Output LP coefficients. Its size should match len(autocorr_in) - 1.
Returns:

The log energy of residual.

Return type:

float

kaldi.feat.mel.get_equal_loudness_vector(mel_banks:MelBanks) → Vector

Computes equal loudness vector.

Parameters:mel_banks (MelBanks) – Mel filterbanks.
Returns:Equal loudness vector.
Return type:Vector

kaldi.feat.mfcc

Classes

Mfcc MFCC computer.
MfccComputer MFCC computer.
MfccOptions Options for computing MFCC features.
class kaldi.feat.mfcc.Mfcc

MFCC computer.

Parameters:opts (MfccOptions) – Options for computing MFCC features.
compute(wave:VectorBase, vtln_warp:float) → Matrix

Computes the MFCC features from input waveform.

This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.

Parameters:
  • wave (Vector) – The input waveform
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

compute_features(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix

Computes the MFCC features from input waveform.

Parameters:
  • wave (Vector) – The input waveform
  • sample_freq (float) – The sampling frequency with which wave is sampled. If sample_freq is higher than the frequency specified in the config, the waveform is downsampled.
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

dim() → int

Returns the feature dimension.

from_other(other:Mfcc) → Mfcc

Constructs a new Mfcc object from another.

class kaldi.feat.mfcc.MfccComputer

MFCC computer.

This is the low-level interface for computing MFCC features.

Parameters:opts (MfccOptions) – Options for computing MFCC features.
compute(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)

Computes one feature frame from one signal frame.

Parameters:
  • signal_log_energy (float) – The log-energy of the signal frame prior to windowing and pre-emphasis, or log(min-positive-float), whichever is greater. Ignored if need_raw_log_energy() returns False.
  • vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VLTN, such as spectrogram features.
  • signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
  • feature (Vector) – Output frame of features.
dim() → int

Returns feature dimension.

from_other(other:MfccComputer) → MfccComputer

Constructs a new MfccComputer object from another.

get_frame_options() → FrameExtractionOptions

Returns frame extraction options.

need_raw_log_energy() → bool

Whether raw log energy is added to features.

class kaldi.feat.mfcc.MfccOptions

Options for computing MFCC features.

cepstral_lifter

Constant controlling the scaling of MFCCs (default=22)

energy_floor

Absolute energy floor used in MFCC computation (default=0.0)

frame_opts

Options for frame extraction

htk_compat

Whether to put energy (or sqrt(2)*C0) last (default=False)

mel_opts

Options for Mel banks (default #mel-banks is 23)

num_ceps

Number of cepstral coefficients including C0 (default=13)

raw_energy

Whether to compute energy before preemphasis and windowing (default=True)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
use_energy

Whether to use energy (not C0) in MFCC computation (default=True)

kaldi.feat.online

Classes

OnlineAppendFeature Online feature concatenation.
OnlineCacheFeature Online feature caching.
OnlineCmvn Online CMVN features.
OnlineCmvnOptions Options for online CMVN.
OnlineCmvnState Online CMVN state.
OnlineDeltaFeature Online delta features.
OnlineFbank Online filterbank features extractor.
OnlineMatrixFeature Online matrix features.
OnlineMfcc Online MFCC features extractor.
OnlinePlp Online PLP features extractor.
OnlineSpliceFrames Online feature splicing.
OnlineSpliceOptions Options for online feature splicing.
OnlineTransform Online feature transformation.
class kaldi.feat.online.OnlineAppendFeature

Online feature concatenation.

Parameters:
  • src1 (OnlineFeatureInterface) – First source of online features.
  • src2 (OnlineFeatureInterface) – Second source of online features.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineCacheFeature

Online feature caching.

Parameters:src (OnlineFeatureInterface) – Source online features.
clear_cache()

Clears feature cache.

dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineCmvn

Online CMVN features.

Parameters:
  • opts (OnlineCmvnOptions) – Options for online CMVN features.
  • cmvn_state (OnlineCmvnState) – Online CMVN state.
  • src (OnlineFeatureInterface) – Source online features.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

freeze(cur_frame:int)

Freezes online CMVN state

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

get_state(cur_frame:int) → OnlineCmvnState

Returns the online CMVN state for current frame

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

set_state(cmvn_state:OnlineCmvnState)

Sets online CMVN state

without_state(opts:OnlineCmvnOptions, src:OnlineFeatureInterface) → OnlineCmvn

Creates a new OnlineCmvn object

class kaldi.feat.online.OnlineCmvnOptions

Options for online CMVN.

check()

Checks if options are valid.

Throws:
RuntimeError: If options are not valid.
cmn_window

Number of frames of sliding context for CMN (default=600)

global_frames

Number of global frames to use in CMN for first utterance of a speaker (default=200)

modulus

Relates to how CMVN is computed internally (deafult=20)

normalize_mean

If True, do cepstral mean normalization (default=True)

normalize_variance

If True, do cepstral mean and variance normalization (default=False)

register(po:ParseOptions)

Registers options with a command-line option parser.

Parameters:po (ParseOptions) – Command-line option parser.
ring_buffer_size

Size of ring buffer used for caching CMVN stats (deafult=20)

skip_dims

Colon seperated list of dimensions to skip (default=”“)

speaker_frames

Number of frames from this speaker to use in CMN (deafult=6000)

class kaldi.feat.online.OnlineCmvnState

Online CMVN state.

from_other(other:OnlineCmvnState) → OnlineCmvnState

Constructs a new OnlineCmvnState object from another.

from_stats(global_stats:DoubleMatrix) → OnlineCmvnState

Constructs a new OnlineCmvnState object from global stats matrix.

frozen_state

If nonempty, CMVN stats representing the frozen state.

global_cmvn_stats

Global CMVN stats

read(is:istream, binary:bool)

Reads online CMVN stats from input stream.

speaker_cmvn_stats

Total CMVN stats for this speaker

write(os:ostream, binary:bool)

Writes online CMVN stats to output stream.

class kaldi.feat.online.OnlineDeltaFeature

Online delta features.

Parameters:
  • opts (DeltaFeaturesOptions) – Options for delta features.
  • src (OnlineFeatureInterface) – Source online features.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineFbank

Online filterbank features extractor.

Parameters:opts (FbankOptions) – Options for extracting filterbank features.
accept_waveform(sampling_rate:float, waveform:VectorBase)

Accepts input waveform for feature extraction.

dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

input_finished()

Marks input as finished.

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineMatrixFeature

Online matrix features.

Parameters:mat (Matrix) – Source feature matrix.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineMfcc

Online MFCC features extractor.

Parameters:opts (MfccOptions) – Options for extracting MFCC features.
accept_waveform(sampling_rate:float, waveform:VectorBase)

Accepts input waveform for feature extraction.

dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

input_finished()

Marks input as finished.

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlinePlp

Online PLP features extractor.

Parameters:opts (PlpOptions) – Options for extracting PLP features.
accept_waveform(sampling_rate:float, waveform:VectorBase)

Accepts input waveform for feature extraction.

dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

input_finished()

Marks input as finished.

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineSpliceFrames

Online feature splicing.

Parameters:
  • opts (OnlineSpliceOptions) – Options for online feature splicing.
  • src (OnlineFeatureInterface) – Source online features.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.online.OnlineSpliceOptions

Options for online feature splicing.

left_context

Left-context for frame splicing prior to LDA (default=4)

register(po:ParseOptions)

Registers options with a command-line option parser.

Parameters:po (ParseOptions) – Command-line option parser.
right_context

Right-context for frame splicing prior to LDA (default=4)

class kaldi.feat.online.OnlineTransform

Online feature transformation.

Parameters:
  • transform (Matrix) – Feature transformation matrix.
  • src (OnlineFeatureInterface) – Source online features.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

kaldi.feat.pitch

Functions

compute_and_process_kaldi_pitch Computes and postprocesses pitch features.
compute_kaldi_pitch Computes pitch features.
process_pitch Postprocesses pitch features.

Classes

OnlinePitchFeature Online pitch feature extractor.
OnlineProcessPitch Online pitch postprocessor.
PitchExtractionOptions Options for pitch extraction.
ProcessPitchOptions Options for pitch postprocessing.
class kaldi.feat.pitch.OnlinePitchFeature

Online pitch feature extractor.

Parameters:opts (PitchExtractionOptions) – Options for pitch extraction.
accept_waveform(sampling_rate:float, waveform:VectorBase)

Accepts input waveform for feature extraction.

dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

input_finished()

Marks input as finished.

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.pitch.OnlineProcessPitch

Online pitch postprocessor.

Parameters:
  • opts (ProcessPitchOptions) – Options for pitch postprocessing.
  • src (OnlineFeatureInterface) – Source pitch features.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames_ready() → int

Returns number of frames ready

class kaldi.feat.pitch.PitchExtractionOptions

Options for pitch extraction.

delta_pitch

Smallest relative change in pitch that the algorithm measures (deafult=0.005)

frame_length_ms

Frame length in milliseconds (default=25)

frame_shift_ms

Frame shift in milliseconds (default=10)

frames_per_chunk

Used for compatibility with online decoding (default=0)

lowpass_cutoff

Cutoff frequency for low pass filter in Hz (default=1000)

lowpass_filter_width

Width of low pass filter, larger gives sharper filter (default=1)

max_f0

max F0 to search for in Hz (default=400)

max_frames_latency

Max number of frames of latency allowed (default=0)

min_f0

min F0 to search for in Hz (default=50)

nccf_ballast

Increasing this factor reduces NCCF fir quiet frames (default=7000)

nccf_ballast_online

Useful for debug. Affects how NCCF ballast is computed (default=False)

nccf_window_shift

NCCF window shift

nccf_window_size

NCCF window size

penalty_factor

Cost factor for F0 change (default=0.1)

preemph_coeff

Coefficient for use in signal preemphasis (default=0.0)

recompute_frame

Used for compatibility with online decoding (default=500)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
resample_freq

Frequency that we downsample the signal to > 2*lowpass_cutoff (default=4000)

samp_freq

Sample frequency, must match the waveform (default=16000)

simulate_first_pass_online

Output features that correspond to what an online decoder would see (default=False)

snip_edges

Whether to omit features for incomplete frames near edges (default=True)

soft_min_f0

min F0, applied in a soft way, must not exceed min_f0 (default=10)

upsample_filter_width

Width of filter used in upsampling NCCF (default=5)

class kaldi.feat.pitch.ProcessPitchOptions

Options for pitch postprocessing.

add_delta_pitch

If true, time derivative of log-pitch is added to output features (default=True)

add_normalized_log_pitch

If true, the normalized-log-pitch is added to output features (default=True)

add_pov_feature

If true, the warped NCCF is added to output features (default=True)

add_raw_log_pitch

If true, log(pitch) is added to output features (default=False)

delay

Number of frames by which the pitch information is delayed (default=0)

delta_pitch_noise_stddev

Standard deviation for noise we add to the delta log-pitch (default=0.005)

delta_pitch_scale

Term to scale the final delta log-pitch feature (default=10.0)

delta_window

Number of frames on each side of central frame, to use for delta window (default=2)

normalization_left_context

Left-context (in frames) for moving window normalization (default=275)

normalization_right_context

Right-context (in frames) for moving window normalization (default=75)

pitch_scale

Scaling factor for the final normalized log-pitch value (default=2.0)

pov_offset

Offset for final POV feature, useful in online decoding (default=0.0)

pov_scale

Scaling factor for final POV (probability of voicing) feature (default=2.0)

register(po:ParseOptions)

Registers options with a command-line option parser.

Parameters:po (ParseOptions) – Command-line option parser.
kaldi.feat.pitch.compute_and_process_kaldi_pitch(pitch_opts:PitchExtractionOptions, process_opts:ProcessPitchOptions, wave:VectorBase) → Matrix

Computes and postprocesses pitch features.

Parameters:
Returns:

Postprocessed pitch features.

Return type:

Matrix

kaldi.feat.pitch.compute_kaldi_pitch(opts:PitchExtractionOptions, wave:VectorBase) → Matrix

Computes pitch features.

Parameters:
Returns:

Pitch features.

Return type:

Matrix

kaldi.feat.pitch.process_pitch(opts:ProcessPitchOptions, input:MatrixBase) → Matrix

Postprocesses pitch features.

Parameters:
Returns:

Postprocessed pitch features.

Return type:

Matrix

kaldi.feat.plp

Classes

Plp PLP computer.
PlpComputer PLP computer.
PlpOptions Options for computing PLP features.
class kaldi.feat.plp.Plp

PLP computer.

Parameters:opts (PlpOptions) – Options for computing PLP features.
compute(wave:VectorBase, vtln_warp:float) → Matrix

Computes the PLP features from input waveform.

This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.

Parameters:
  • wave (Vector) – The input waveform
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

compute_features(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix

Computes the PLP features from input waveform.

Parameters:
  • wave (Vector) – The input waveform
  • sample_freq (float) – The sampling frequency with which wave is sampled. If sample_freq is higher than the frequency specified in the config, the waveform is downsampled.
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

dim() → int

Returns the feature dimension.

from_other(other:Plp) → Plp

Constructs a new Plp object from another.

class kaldi.feat.plp.PlpComputer

PLP computer.

This is the low-level interface for computing PLP features.

Parameters:opts (PlpOptions) – Options for computing PLP features.
compute(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)

Computes one feature frame from one signal frame.

Parameters:
  • signal_log_energy (float) – The log-energy of the signal frame prior to windowing and pre-emphasis, or log(min-positive-float), whichever is greater. Ignored if need_raw_log_energy() returns False.
  • vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VTLN, such as spectrogram features.
  • signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
  • feature (Vector) – Output frame of features.
dim() → int

Returns feature dimension.

from_other(other:PlpComputer) → PlpComputer

Constructs a new PlpComputer object from another.

get_frame_options() → FrameExtractionOptions

Returns frame extraction options.

need_raw_log_energy() → bool

Whether raw log energy is added to features.

class kaldi.feat.plp.PlpOptions

Options for computing PLP features.

cepstral_lifter

Constant controlling the scaling of PLPs (default=22)

cepstral_scale

Scaling constant in PLP computation (default=1.0)

compress_factor

Compression factor in PLP computation (default=0.33333)

energy_floor

Absolute energy floor used in PLP computation (default=0.0)

frame_opts

Options for frame extraction

htk_compat

Whether to put energy (or C0) last (default=False)

lpc_order

Order of LPC analysis (default=12)

mel_opts

Options for Mel banks (default #mel-banks is 23)

num_ceps

Number of cepstral coefficients including C0 (default=13)

raw_energy

Whether to compute energy before preemphasis and windowing (default=True)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
use_energy

Whether to use energy (not C0) in MFCC computation (default=True)

kaldi.feat.signal

Functions

convolve_signals Does simple non-FFT-based convolution of two signals.
downsample_wave_form Downsamples a waveform.
fft_based_block_convolve_signals Does FFT-based block convolution of two signals using overlap-add method.
fft_based_convolve_signals Does FFT-based convolution of two signals.

Classes

ArbitraryResample Arbitrary resampler.
LinearResample Linear resampler.
class kaldi.feat.signal.ArbitraryResample

Arbitrary resampler.

This allows you to resample a signal (assumed zero outside the sample region, not periodic) at arbitrary specified time values, which don’t have to be linearly spaced.

The low-pass filter cutoff “filter_cutoff_hz” should be less than half the sample rate; “num_zeros” should probably be at least two preferably more; higher numbers give sharper filters but will be less efficient.

num_samples_in

Number of samples at input

num_samples_out

Number of samples at output

resample(input:MatrixBase, output:MatrixBase)

Resamples input signal.

input.num_rows and output.num_rows should be equal and nonzero. input.num_cols should be equal to num_samples_in. output.num_cols should be equal to num_samples_out.

Parameters:
  • input (MatrixBase) – Input signal.
  • output (MatrixBase) – Output signal.
resample_vector(input:VectorBase, output:VectorBase)

Resamples input signal.

This version processes just one vector.

class kaldi.feat.signal.LinearResample

Linear resampler.

LinearResample is a special case of ArbitraryResample, where we want to resample a signal at linearly spaced intervals (this means we want to upsample or downsample the signal). It is more efficient than ArbitraryResample because we can construct it just once.

We require that the input and output sampling rate be specified as integers, as this is an easy way to specify that their ratio be rational.

resample(input:VectorBase, flush:bool) → Vector

Resamples input signal.

If you call it with flush == True and you have never called it with flush == False, it just resamples the input signal.

You can also use this to process a signal a piece at a time by calling it with flush == False except for the last piece.

If you call it with flush == false, it won’t output the last few samples but will remember them, so that if you later give it a second piece of the input signal it can process it correctly.

If your most recent call to the object was with flush == false, it will have internal state; you can remove this by calling reset().

Empty input is acceptable.

Parameters:
  • input (MatrixBase) – Input signal.
  • flush (bool) – Whether to flush output.
Returns:

Output signal.

Return type:

Vector

reset()

Resets the state of the resampler.

kaldi.feat.signal.convolve_signals(filter:Vector, signal:Vector)

Does simple non-FFT-based convolution of two signals.

It is suggested to use the FFT-based convolution function which is more efficient.

Parameters:
  • filter (Vector) – The filter.
  • signal (Vector) – The signal.
kaldi.feat.signal.downsample_wave_form(orig_freq:float, wave:VectorBase, new_freq:float) → Vector

Downsamples a waveform.

This is a convenience wrapper for LinearResample.

kaldi.feat.signal.fft_based_block_convolve_signals(filter:Vector, signal:Vector)

Does FFT-based block convolution of two signals using overlap-add method.

This is an efficient way of computing the discrete convolution of a long signal with a finite impulse response filter.

Parameters:
  • filter (Vector) – The filter.
  • signal (Vector) – The signal.
kaldi.feat.signal.fft_based_convolve_signals(filter:Vector, signal:Vector)

Does FFT-based convolution of two signals.

This is less efficient than fft_based_block_convolve_signals() as it processes the entire signal with a single FFT.

Parameters:
  • filter (Vector) – The filter.
  • signal (Vector) – The signal.

kaldi.feat.spectrogram

Classes

Spectrogram Spectrogram computer.
SpectrogramComputer Spectrogram computer.
SpectrogramOptions Options for computing spectrogram features.
class kaldi.feat.spectrogram.Spectrogram

Spectrogram computer.

Parameters:opts (SpectrogramOptions) – Options for computing spectrogram features.
compute(wave:VectorBase, vtln_warp:float) → Matrix

Computes the spectrogram features from input waveform.

This interface for computing features requires that the user has already checked that the sampling frequency of the waveform is equal to the sampling frequency specified in the frame extraction options.

Parameters:
  • wave (Vector) – The input waveform
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

compute_features(wave:VectorBase, sample_freq:float, vtln_warp:float) → Matrix

Computes the spectrogram features from input waveform.

Parameters:
  • wave (Vector) – The input waveform
  • sample_freq (float) – The sampling frequency with which wave is sampled. If sample_freq is higher than the frequency specified in the config, the waveform is downsampled.
  • vtln_warp (float) – The VTLN wrapping factor (normally 1.0).
Returns:

The matrix of features, where the row-index is the frame index.

dim() → int

Returns the feature dimension.

from_other(other:Spectrogram) → Spectrogram

Constructs a new Spectrogram object from another.

class kaldi.feat.spectrogram.SpectrogramComputer

Spectrogram computer.

This is the low-level interface for computing spectrogram features.

Parameters:opts (FbankOptions) – Options for computing spectrogram features.
compute(signal_log_energy:float, vtln_warp:float, signal_frame:VectorBase, feature:VectorBase)

Computes one feature frame from one signal frame.

Parameters:
  • signal_log_energy (float) – The log-energy of the signal frame prior to windowing and pre-emphasis, or log(min-positive-float), whichever is greater. Ignored if need_raw_log_energy() returns False.
  • vtln_warp (float) – The VTLN warping factor. Normally 1.0, meaning no warping is to be done. This value is ignored for feature types that don’t support VLTN, such as spectrogram features.
  • signal_frame (Vector) – One frame of the signal. The frame vector is overwritten with intermedite values during computation to avoid new memory allocation.
  • feature (Vector) – Output frame of features.
dim() → int

Returns feature dimension.

from_other(other:SpectrogramComputer) → SpectrogramComputer

Constructs a new SpectrogramComputer object from another.

get_frame_options() → FrameExtractionOptions

Returns frame extraction options.

need_raw_log_energy() → bool

Whether raw log energy is added to features.

class kaldi.feat.spectrogram.SpectrogramOptions

Options for computing spectrogram features.

energy_floor

Absolute energy floor used in spectrogram computation (default=0.0)

frame_opts

Options for frame extraction

raw_energy

Whether to compute energy before preemphasis and windowing (default=True)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

kaldi.feat.wave

kaldi.feat.wave.WAVE_SAMPLE_MAX = 32768.0

Classes

WaveData Wave file data.
WaveInfo Wave file header information.
class kaldi.feat.wave.WaveData

Wave file data.

clear()

Clears the contents.

copy_from(other:WaveData)

Copies the contents of another WaveData object to this one.

data() → Matrix

Returns wave data matrix.

Data is returned as a matrix because there may be multiple channels. If wave file is mono, the returned matrix will have just one row.

Returns:the wave data matrix.
Return type:Matrix
duration

Approximate duration (in seconds).

from_data(samp_freq:float, data:MatrixBase) → WaveData

Creates a new WaveData object from a waveform matrix.

read(is:istream)

Reads wave file from input stream.

Parameters:is (istream) – Input stream. It should be opened in binary mode.
Throws:
RuntimeError: on error.
samp_freq

Sample frequency (in Hz).

swap(other:WaveData)

Swaps the contents with another WaveData object.

write(os:ostream)

Writes wave file to input stream.

Parameters:os (ostream) – Output stream. It should be opened in binary mode.
Throws:
RuntimeError: on error.
class kaldi.feat.wave.WaveInfo

Wave file header information.

block_align

Number of data bytes per sample.

data_bytes

Number of data bytes. Invalid if is_streamed is True.

duration

Approximate duration (in seconds). Invalid if is_streamed is True.

is_streamed

Whether stream size is unknown.

num_channels

Number of channels.

read(is:istream)

Reads wave file header from input stream.

After header is successfully read, input stream will be positioned at the beginning of the wave data.

Parameters:is (istream) – Input stream. It should be opened in binary mode.
Throws:
RuntimeError: on error.
reverse_bytes

Whether file byte order is different from machine byte order.

samp_freq

Sample frequency (in Hz).

sample_count

Number of samples in stream. Invalid if is_streamed is True.

kaldi.feat.window

Functions

dither Dithers waveform.
extract_window Extracts a windowed frame of waveform.
first_sample_of_frame Computes the index of the first sample of a given frame.
num_frames Computes the number of frames that we can extact from a waveform.
preemphasize Preemphasizes waveform.
process_window Does all the windowing steps after extracting the windowed signal.

Classes

FeatureWindowFunction Windowing function.
FrameExtractionOptions Options for extracting frames.
class kaldi.feat.window.FeatureWindowFunction

Windowing function.

from_options(opts:FrameExtractionOptions) → FeatureWindowFunction

Create a windowing function from a FrameExtractionOptions object

from_other(other:FeatureWindowFunction) → FeatureWindowFunction

Create a windowing function from another

window

Window

class kaldi.feat.window.FrameExtractionOptions

Options for extracting frames.

allow_downsample

Whether input waveform can have a higher sampling frequency than specified (default=False)

blackman_coeff

Constant coefficient for generalized Blackman window (default=0.42)

dither

Dithering constant, 0.0 means no dithering (default=1.0)

frame_length_ms

Frame length in milliseconds (default=25)

frame_shift_ms

Frame shift in milliseconds (default=10)

padded_window_size() → int

Returns padded window size in terms of number of samples

preemph_coeff

Coefficient for use in signal preemphasis (default=0.97)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
remove_dc_offset

Subtract mean from waveform on each frame (default=True)

round_to_power_of_two

Round window size to power of two by zero-paddinf FFT input (default=True)

samp_freq

Sample frequency (default=16000)

snip_edges

Whether to output only frames that fit in the waveform (default=True)

window_shift() → int

Returns window shift in terms of number of samples

window_size() → int

Returns window size in terms of number of samples

window_type

Type of window, one of hamming, hanning, povey (default), rectangular, blackman

kaldi.feat.window.dither(waveform:VectorBase, dither_value:float)

Dithers waveform.

Parameters:
  • waveform (Vector) – Waveform to be dithered.
  • dither_value (float) – Dithering constant.
kaldi.feat.window.extract_window(sample_offset:int, wave:VectorBase, f:int, opts:FrameExtractionOptions, window_function:FeatureWindowFunction, window:Vector)

Extracts a windowed frame of waveform.

It does everything done by process_window().

Parameters:
  • sample_offset (int) – If ‘wave’ is not the entire waveform, but part of it to the left has been discarded, then the number of samples prior to ‘wave’ that we have already discarded. Set this to zero if you are processing the entire waveform in one piece, or if you get ‘no matching function’ compilation errors when updating the code.
  • wave (Vector) – The input waveform
  • f (int) – The frame index to be extracted.
  • opts (FrameExtractionOptions) – Options for frame extraction.
  • window_function (FeatureWindowFunction) – The windowing function. It should have been initialized using ‘opts’.
  • window (Vector) – The output windowed waveform (possibly-padded). It will be resized as needed.
kaldi.feat.window.first_sample_of_frame(frame:int, opts:FrameExtractionOptions) → int

Computes the index of the first sample of a given frame.

Parameters:
Returns:

The index of the first sample.

Return type:

int

kaldi.feat.window.num_frames(num_samples:int, opts:FrameExtractionOptions, flush:bool=default) → int

Computes the number of frames that we can extact from a waveform.

Assumes that the waveform has the sampling rate specified in frame extraction options.

Parameters:
  • num_samples (int) – The number of samples in the waveform.
  • opts (FrameExtractionOptions) – Options for frame extraction.
  • flush (bool) – True if we are asserting that this number of samples is ‘all there is’, False if we expecting more data to possibly come in. This only makes a difference to the answer if opts.snips_edges == False. For offline feature extraction you always want flush == True. In an online-decoding context, once you know (or decide) that no more data is coming in, you’d call it with flush == True at the end to flush out any remaining data (default=True).
Returns:

Number of frames that can be extracted.

Return type:

int

kaldi.feat.window.preemphasize(waveform:VectorBase, preemph_coeff:float)

Preemphasizes waveform.

Parameters:
  • waveform (Vector) – Waveform to be preemphasized.
  • preemph_coeff (float) – Preemphasis coefficient.
kaldi.feat.window.process_window(opts:FrameExtractionOptions, window_function:FeatureWindowFunction, window:VectorBase)

Does all the windowing steps after extracting the windowed signal.

Depending on the configuration, it does dithering, dc offset removal, preemphasis, and multiplication by the windowing function.

Parameters:
  • opts (FrameExtractionOptions) – Options for frame extraction.
  • window_function (FeatureWindowFunction) – The windowing function. It should have been initialized using ‘opts’.
  • window (Vector) – The signal window to be processed. Its size should match ‘opts.padded_window_size()’.