kaldi.online2

Online Endpointing

This module contains a simple facility for endpointing, that should be used in conjunction with the online decoding code. By endpointing in this context we mean “deciding when to stop decoding”, and not generic speech/silence segmentation. The use-case that we have in mind is some kind of dialog system where, as more speech data comes in, we decode more and more, and we have to decide when to stop decoding.

The endpointing rule is a disjunction of conjunctions. The way we have it configured, it’s an OR of five rules, and each rule has the following form:

(<contains-nonsilence> || !rule.must_contain_nonsilence)
&& <length-of-trailing-silence> >= rule.min_trailing_silence
&& <relative-cost> <= rule.max_relative_cost
&& <utterance-length> >= rule.min_utterance_length

where:

<contains-nonsilence>
is true if the best traceback contains any nonsilence phone;
<length-of-trailing-silence>
is the length in seconds of silence phones at the end of the best traceback (we stop counting when we hit non-silence),
<relative-cost>
is a value >= 0 extracted from the decoder, that is zero if a final-state of the grammar FST had the best cost at the final frame, and infinity if no final-state was active (and >0 for in-between cases).
<utterance-length>
is the number of seconds of the utterance that we have decoded so far.

All of these pieces of information are obtained from the best-path traceback from the decoder, which is output by the function get_best_path(). We do this every time we’re finished processing a chunk of data.

For details of the default rules, see OnlineEndpointConfig.

It’s up to the caller whether to use final-probs or not when generating the best-path, i.e. decoder.get_best_path(use_final_probs=True|False), but we recommend not using them. If you do use them, then depending on the grammar, you may force the best-path to decode non-silence even though that was not what it really preferred to decode.

Functions

decoding_endpoint_detected Determines if we should terminate decoding.
decoding_endpoint_detected_grammar Determines if we should terminate decoding.
endpoint_detected Determines if any of the endpointing rules are active for given arguments.
trailing_silence_length Returns the number of trailing silence frames on the best-path traceback.
trailing_silence_length_grammar Returns the number of trailing silence frames on the best-path traceback.

Classes

DecodableDiagGmmScaledOnline Decodable for online decoding with diagonal GMMs.
OnlineEndpointConfig Online endpointing configuration.
OnlineEndpointRule Online endpointing rule.
OnlineFeaturePipeline Online feature pipeline.
OnlineFeaturePipelineCommandLineConfig Command-line configuration options for online feature pipeline.
OnlineFeaturePipelineConfig Configuration options for online feature pipeline.
OnlineGmmAdaptationState Online GMM adaptation state.
OnlineGmmDecodingAdaptationPolicyConfig Configuration options for re-estimating basis-fMLLR during online decoding.
OnlineGmmDecodingConfig Configuration options for online GMM decoding.
OnlineGmmDecodingModels GMM models used for online decoding.
OnlineIvectorExtractionConfig Command-line configuration options for online ivector extraction.
OnlineIvectorExtractionInfo Configuration options for online iVector extraction.
OnlineIvectorExtractorAdaptationState Adaptation state of the online ivector extractor.
OnlineIvectorFeature Online ivector extractor.
OnlineNnetFeaturePipeline Online feature pipeline for neural network decoding.
OnlineNnetFeaturePipelineConfig Command-line configuration options for online neural network feature pipeline.
OnlineNnetFeaturePipelineInfo Configuration options for online neural network feature pipeline.
OnlineSilenceWeighting Online silence weighting.
OnlineSilenceWeightingConfig Configuration options for online silence weighting.
SingleUtteranceGmmDecoder Online lattice-generating decoder for diagonal GMM models.
SingleUtteranceNnetDecoder Online lattice-generating decoder for neural network models.
SingleUtteranceNnetGrammarDecoder Online lattice-generating decoder for neural network models.
class kaldi.online2.DecodableDiagGmmScaledOnline(am, trans_model, scale, input_feats)

Decodable for online decoding with diagonal GMMs.

Parameters:
  • am (AmDiagGmm) – Diagonal GMM.
  • trans_model (TransitionModel) – Transition model.
  • scale (float) – Acoustic scale.
  • input_feats (OnlineFeatureInterface) – Online input features.
is_last_frame(frame:int) → bool

Checks if given frame is the last frame.

log_likelihood(frame:int, index:int) → float

Returns the log-likehood of the given index for the given frame.

num_frames_ready() → int

Returns number of frames ready for decoding.

num_indices() → int

Returns number of indices.

class kaldi.online2.OnlineEndpointConfig

Online endpointing configuration.

Decoding is terminated if any of the endpointing rules evaluates to True.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
rule1

Default Rule1 times out after 5 seconds of silence even if decoded nothing.

rule2

Default Rule2 times out after 0.5 seconds of silence if reached final-state with good probability.

rule3

Default Rule3 times out after 1.0 seconds of silence if reached final-state with OK probability.

rule4

Default Rule4 times out after 2.0 seconds of silence after decoding something even if final-state was not reached.

rule5

Default Rule5 times out after the utterance is 20.0 seconds.

silence_phones

Colon separated list of phones to be considered as silence.

class kaldi.online2.OnlineEndpointRule

Online endpointing rule.

Endpointing rule applies if all of the conditions are satisfied.

Parameters:
  • must_contain_nonsilence (bool) – If true, endpointing rule applies only if best-path traceback contains non-silence.
  • min_trailing_silence (float) – Endpointing rule applies only if duration of trailing silence (in seconds) >= this value.
  • max_relative_cost (float) – Endpointing rule applies only if relative-cost of final-states <= this value.
  • min_utterance_length (float) – Endpointing rule applies only if utterance length (in seconds) >= this value.
max_relative_cost

Endpointing rule applies only if relative-cost of final-states <= this value.

min_trailing_silence

Endpointing rule applies only if duration of trailing silence (in seconds) >= this value.

min_utterance_length

Endpointing rule applies only if utterance length (in seconds) >= this value.

must_contain_nonsilence

If true, endpointing rule applies only if best-path traceback contains non-silence.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
register_with_prefix(prefix:str, opts:OptionsItf)

Registers prefixed options with an object implementing the options interface.

Parameters:
  • prefix (str) – String that will be prepended to option names.
  • opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
class kaldi.online2.OnlineFeaturePipeline

Online feature pipeline.

This class is responsible for putting together the various stages of the feature-processing pipeline, in an online setting. This does not attempt to be fully generic, we just try to handle the common case. Since the online-decoding code needs to “know about” things like CMN and fMLLR in order to do adaptation, it’s hard to make this completely generic.

Parameters:config (OnlineFeaturePipelineConfig) – Configuration options for online feature pipeline.
accept_waveform(sampling_rate:float, waveform:VectorBase)

Accepts more data to process.

It won’t actually process the data, it will just copy it.

Parameters:
  • sampling_rate (float) – Sampling rate of the waveform. It is needed to assert that it matches the sampling rate given in the config.
  • waveform (Vector) – More data to process.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

freeze_cmvn()

Freezes CMVN.

Throws:
RuntimeError: If num_frames_ready() == 0.
get_cmvn_state() → OnlineCmvnState

Returns the CMVN state.

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

have_fmllr_transform() → bool

Returns True if an fMLLR transform has been set.

input_finished()

Tells the class that you wont be providing any more waveform.

This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate).

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

new() → OnlineFeaturePipeline

Returns a newly initialized copy.

This does not duplicate all the internal state or the speaker-adaptation state, but gives you a freshly initialized version of this object, as if you had initialized it using the constructor that takes the configuration object. After calling this you may want to call set_cmvn_state() and set_transform().

num_frames_ready() → int

Returns number of frames ready

set_cmvn_state(cmvn_state:OnlineCmvnState)

Sets the CMVN state.

set_transform(transform:MatrixBase)

Sets the fMLLR transform.

Call it with an empty matrix if you want to stop it using any transform.

class kaldi.online2.OnlineFeaturePipelineCommandLineConfig

Command-line configuration options for online feature pipeline.

This configuration class is to set up OnlineFeaturePipelineConfig, which in turn is the configuration class for OnlineFeaturePipeline. Instead of taking the options for the parts of the feature pipeline directly, it reads in the configuration files for each part.

add_deltas

Append delta features (default=False)

add_pitch

Append pitch features to raw MFCC/PLP features (default=False)

cmvn_config

Configuration file for online CMVN features (e.g. conf/online_cmvn.conf)

delta_config

Configuration file for delta features (e.g. conf/delta.conf)

If not supplied, will not compute delta features; supply empty config to use defaults.

fbank_config

Configuration file for filterbank features (e.g. conf/fbank.conf)

feature_type

Base feature type [mfcc (default), plp, fbank].

global_cmvn_stats_rxfilename

**Extended filename for global CMVN stats (e.g. ‘ark* – matrix-sum scp* – data/train/cmvn.scp’)

lda_rxfilename

Extended filename for LDA or LDA+MLLT matrix, if using LDA (e.g. exp/foo/final.mat)

mfcc_config

Configuration file for MFCC features (e.g. conf/mfcc.conf)

pitch_config

Configuration file for pitch features (e.g. conf/pitch.conf)

pitch_process_config

Configuration file for post-processing pitch features (e.g. conf/pitch_process.conf)

plp_config

Configuration file for PLP features (e.g. conf/plp.conf)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
splice_config

Configuration file for feature splicing, if done (e.g. prior to LDA)

splice_feats

Splice features with left and right context (default=False)

class kaldi.online2.OnlineFeaturePipelineConfig

Configuration options for online feature pipeline.

This configuration class is responsible for storing the configuration options for OnlineFeaturePipeline. The options can be set either directly in code or indirectly by reading config files on disk via OnlineFeaturePipelineCommandLineConfig.

add_deltas

Append delta features (default=True)

add_pitch

Append pitch features to raw MFCC/PLP features (default=False)

cmvn_opts

Options for online CMVN features

delta_opts

Options for delta features

fbank_opts

Options for filterbank features

feature_type

Base feature type [mfcc (default), plp, fbank]

frame_shift_in_seconds() → float

Returns frame shift in seconds.

from_config(cmdline_config:OnlineFeaturePipelineCommandLineConfig) → OnlineFeaturePipelineConfig

Creates a new OnlineFeaturePipelineConfig from OnlineFeaturePipelineCommandLineConfig.

global_cmvn_stats_rxfilename

**Extended filename for global CMVN stats (e.g. ‘ark* – matrix-sum scp* – data/train/cmvn.scp’)

lda_rxfilename

Extended filename for LDA or LDA+MLLT matrix, if using LDA (e.g. exp/foo/final.mat)

mfcc_opts

Options for MFCC features

pitch_opts

Options for pitch features

pitch_process_opts

Options for post-processing pitch features

plp_opts

Options for PLP features

splice_feats

Splice features with left and right context (default=False)

splice_opts

Options for feature splicing, if done

class kaldi.online2.OnlineGmmAdaptationState

Online GMM adaptation state.

cmvn_state

Online CMVN state

read(in_stream:istream, binary:bool)

Reads this object from input stream.

spk_stats

Speaker transform stats.

transform

Transform matrix

write(out_stream:ostream, binary:bool)

Writes this object to output stream.

class kaldi.online2.OnlineGmmDecodingAdaptationPolicyConfig

Configuration options for re-estimating basis-fMLLR during online decoding.

adaptation_delay

Delay before first basis-fMLLR adaptation for not-first utterances of each speaker

adaptation_first_utt_delay

Delay before first basis-fMLLR adaptation for first utterance of each speaker

adaptation_first_utt_ratio

Ratio that controls frequency of fMLLR adaptation for first utterance of each speaker

adaptation_ratio

Ratio that controls frequency of fMLLR adaptation for not-first utterances of each speaker

check()

Checks if configuration is valid.

do_adapt(chunk_begin_secs:float, chuck_end_secs:float, is_first_utterance:bool) → bool

Checks if we are scheduled to re-estimate fMLLR.

Parameters:
  • chunk_begin_secs (float) – Chunk begin time in seconds.
  • chuck_end_secs (float) – Chunk end time in seconds.
  • is_first_utterance (bool) – First utterance or not.
Returns:

True if we are scheduled to re-estimate fMLLR in the interval [chunk_begin_secs, chuck_end_secs), False otherwise.

Return type:

bool

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
class kaldi.online2.OnlineGmmDecodingConfig

Configuration options for online GMM decoding.

acoustic_scale

Scaling factor acoustic likelihoods

adaptation_policy_opts

Options for re-estimating basis-fMLLR during online decoding

basis_opts

Options for basis-fMLLR adaptation

faster_decoder_opts

Options for lattice-generating faster decoder.

fmllr_basis_rxfilename

Extended filename for reading the basis elements for basis-fMLLR.

fmllr_lattice_beam

Beam used in pruning lattices for fMLLR estimation

model_rxfilename

Extended filename for reading the model used to estimate fMLLR transforms.

This is required.

online_alimdl_rxfilename

Extended filename for reading the model trained with online-CMN features.

This is only needed if it is different from model_rxfilename.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
rescore_model_rxfilename

Extended filename for reading the discriminatively trained model.

This is only needed if it is different from model_rxfilename.

silence_phones

Colon-separated list of integer ids of silence phones

silence_weight

Weight applied to silence frames for fMLLR estimation.

This has an effect only if silence_phones option is supplied.

class kaldi.online2.OnlineGmmDecodingModels(config)

GMM models used for online decoding.

This class is used to read, store and give access to the models used for 3 phases of decoding (first-pass with online-CMN features; the ML models used for estimating transforms; and the discriminatively trained models). It takes care of the logic whereby if, say, the last model isn’t given we default to the second model, and so on, and it interpretes the filenames from the config object.

Parameters:config (OnlineGmmDecodingConfig) – Options for online GMM decoding.
basis_fmllr_estimate() → BasisFmllrEstimate

Returns the basis elements for basis-fMLLR.

get_final_model() → AmDiagGmm

Returns the discriminatively trained model.

If supplied in the config, otherwise it returns the ML-trained model.

get_model() → AmDiagGmm

Returns the ML-trained model used to get transforms.

get_online_alignment_model() → AmDiagGmm

Returns the model trained with online-CMN features.

If supplied in the config, otherwise it returns the ML-trained model.

get_transition_model() → TransitionModel

Returns the transition model.

class kaldi.online2.OnlineIvectorExtractionConfig

Command-line configuration options for online ivector extraction.

This class includes configuration variables relating to the online ivector extraction, but not including configuration for the “base feature”, i.e. MFCC/PLP/filterbank, which is an input to this feature.

This configuration class is to set up OnlineIvectorExtractionInfo, which in turn is the configuration class for OnlineIvectorFeature. Instead of taking the options for each part of the online ivector extractor directly, it reads in the configuration file for each part.

cmvn_config_rxfilename

Extended filename for reading the online CMVN configuration file.

diag_ubm_rxfilename

Extended filename for reading the diagonal UBM used for obtaining the posteriors.

global_cmvn_stats_rxfilename

Extended filename for reading the global CMVN stats.

greedy_ivector_extractor

Whether to read ahead as much as we can when computing ivector stats (default=False).

ivector_extractor_rxfilename

Extended filename for reading the ivector extractor.

ivector_period

Online ivector period (default=10).

lda_mat_rxfilename

Extended filename for reading the LDA matrix.

max_count

If nonzero, maximum stats count we allow before scaling down stats (default=0.0).

max_remembered_frames

Largest number of frames to remember between utterances of the same speaker (default=1000).

min_post

Threshold for posterior pruning in ivector extraction (default=0.025).

num_cg_iters

Number of iterations (default=15).

num_gselect

Maximum number of posteriors to use per frame in ivector extraction (default=5).

posterior_scale

Scale on posteriors used in ivector extraction (default=0.1).

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
splice_config_rxfilename

Extended filename for reading the online frame splicing configuration file.

use_most_recent_ivector

Whether to return the most recent ivector rather than the one for current frame (default=True).

class kaldi.online2.OnlineIvectorExtractionInfo[source]

Configuration options for online iVector extraction.

check()

Checks if configuration options are valid.

cmvn_opts

Online CMVN options

diag_ubm

Diagonal UBM

extractor

Ivector extractor

from_config(config:OnlineIvectorExtractionConfig) → _OnlineIvectorExtractionInfo

Creates a new OnlineIvectorExtractionInfo from a OnlineIvectorExtractionConfig.

global_cmvn_stats
greedy_ivector_extractor

Whether to read ahead as much as we can when computing ivector stats (default=False).

init(config:OnlineIvectorExtractionConfig)

Initializes with the given config.

ivector_period

Online ivector period (default=10).

lda_mat
max_count

If nonzero, maximum stats count we allow before scaling down stats (default=0.0).

max_remembered_frames

Largest number of frames to remember between utterances of the same speaker (default=1000).

min_post

Threshold for posterior pruning in ivector extraction (default=0.025).

num_cg_iters

Number of iterations (default=15).

num_gselect

Maximum number of posteriors to use per frame in ivector extraction (default=5).

posterior_scale

Scale on posteriors used in ivector extraction (default=0.1).

splice_opts

Online frame splicing options

use_most_recent_ivector

Whether to return the most recent ivector rather than the one for current frame (default=True).

class kaldi.online2.OnlineIvectorExtractorAdaptationState

Adaptation state of the online ivector extractor.

This class stores the adaptation state from the online ivector extractor, which can help you to initialize the adaptation state for the next utterance of the same speaker in a more informed way.

cmvn_state

Online CMVN state (used for getting posteriors for ivector extraction)

from_info(info:_OnlineIvectorExtractionInfo) → OnlineIvectorExtractorAdaptationState

Creates a new OnlineIvectorExtractorAdaptationState from OnlineIvectorExtractionInfo.

from_other(other:OnlineIvectorExtractorAdaptationState) → OnlineIvectorExtractorAdaptationState

Creates a new OnlineIvectorExtractorAdaptationState from another.

ivector_stats

Stats for online ivector estimation

limit_frames(max_remembered_frames:float, posterior_scale:float)

Limits the frames.

Scales down the stats if needed to ensure the number of frames in the speaker-specific CMVN stats does not exceed max_remembered_frames.

read(is:istream, binary:bool)

Reads this object from input stream.

write(os:ostream, binary:bool)

Writes this object to output stream.

class kaldi.online2.OnlineIvectorFeature

Online ivector extractor.

This class extracts online ivectors from raw features such as MFCC, PLP or filterbank.

Parameters:
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)

Gets online iVector adaptation state.

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

num_frames() → float

Returns number of frames seen

num_frames_ready() → int

Returns number of frames ready

objf_impr_per_frame() → float

Returns Objective improvement per frame from iVector estimation

set_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)

Sets online iVector adaptation state.

ubm_loglike_per_frame() → float

Returns UBM log-like per frame

update_frame_weights(delta_weights:list<tuple<int, float>>)

Updates frame weights.

class kaldi.online2.OnlineNnetFeaturePipeline

Online feature pipeline for neural network decoding.

This is a different version of the online feature pipeline specialized for use in neural network decoding with iVectors. Our recipe is that we extract iVectors that will be used as an additional input to the neural network, in addition to a window of several frames of spliced raw features (MFCC, PLP or filterbanks). The iVectors are extracted on top of a (splice+LDA+MLLT) feature pipeline, with the added complication that the GMM posteriors used for the iVector extraction are obtained with a version of the features that has online cepstral mean (and optionally variance) normalization, whereas the stats for iVector are accumulated with a non-mean-normalized version of the features. The idea here is that we want the iVector to learn the mean offset, but we want the posteriors to be somewhat invariant to mean offsets.

Parameters:config (OnlineNnetFeaturePipelineInfo) – Configuration options for online neural network feature pipeline.
accept_waveform(sampling_rate:float, waveform:VectorBase)

Accepts more data to process.

It won’t actually process the data, it will just copy it.

Parameters:
  • sampling_rate (float) – Sampling rate of the waveform. It is needed to assert that it matches the sampling rate given in the config.
  • waveform (Vector) – More data to process.
dim() → int

Returns feature dimension

frame_shift_in_seconds() → float

Returns frame shift in seconds

get_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)

Gets online iVector adaptation state.

get_frame(frame:int, feat:VectorBase)

Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)

Returns the features for given frame indices

input_feature() → OnlineFeatureInterface

Returns the part of the feature pipeline that would be given as the primary (non-iVector) input to neural network.

input_finished()

Tells the class that you wont be providing any more waveform.

This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate).

is_last_frame(frame:int) → bool

Returns True if this is last frame, otherwise False

ivector_feature() → OnlineIvectorFeature

Returns the ivector-extraction part of the feature pipeline (or None if iVectors are not being used).

num_frames_ready() → int

Returns number of frames ready

set_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)

Sets online iVector adaptation state.

class kaldi.online2.OnlineNnetFeaturePipelineConfig

Command-line configuration options for online neural network feature pipeline.

This configuration class is to set up OnlineNnetFeaturePipelineInfo, which in turn is the configuration class for OnlineNnetFeaturePipeline. Instead of taking the options for the parts of the feature pipeline directly, it reads in the configuration files for each part.

add_pitch

Append pitch features to raw MFCC/PLP features (default=False)

fbank_config

Configuration file for filterbank features (e.g. conf/fbank.conf)

feature_type

Base feature type [mfcc (default), plp, fbank].

ivector_extraction_config

Configuration file for online iVector extraction (e.g. conf/ivector.conf)

mfcc_config

Configuration file for MFCC features (e.g. conf/mfcc.conf)

online_pitch_config

Configuration file for online pitch features (e.g. conf/online_pitch.conf)

plp_config

Configuration file for PLP features (e.g. conf/plp.conf)

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
silence_weighting_config

Options for online silence weighting

class kaldi.online2.OnlineNnetFeaturePipelineInfo

Configuration options for online neural network feature pipeline.

This configuration class is responsible for storing the configuration options for OnlineNnetFeaturePipeline (including the actual LDA and CMVN-stats matrices, and the iVector extractor). The options can be set either directly in code or indirectly by reading config files on disk via OnlineNnetFeaturePipelineConfig.

add_pitch

Append pitch features to raw MFCC/PLP features (default=False)

fbank_opts

Options for filterbank features

feature_type

Base feature type [mfcc (default), plp, fbank]

frame_shift_in_seconds() → float

Returns frame shift in seconds.

from_config(config:OnlineNnetFeaturePipelineConfig) → OnlineNnetFeaturePipelineInfo

Creates a new OnlineNnetFeaturePipelineInfo from OnlineNnetFeaturePipelineConfig.

ivector_dim() → int

Returns iVector dimension.

ivector_extractor_info

Options for online iVector extraction.

mfcc_opts

Options for MFCC features

pitch_opts

Options for pitch features

pitch_process_opts

Options for post-processing pitch features

plp_opts

Options for PLP features

silence_weighting_config

Options for weighting silence in iVector adaptation.

use_ivectors

Use iVectors as an extra input to the neural net

class kaldi.online2.OnlineSilenceWeighting

Online silence weighting.

This class is responsible for keeping track of the best-path traceback from the decoder (efficiently) and computing a weighting of the data based on the classification of frames as silence (or not silence)… also with a duration limitation, so data from a very long run of the same transition-id will get weighted down. (this is often associated with misrecognition or silence).

Parameters:
active() → bool

Returns true if list of silence phones is not empty and silence weight is not 1.0

compute_current_traceback(decoder:LatticeFasterOnlineDecoder)

Computes current traceback.

compute_current_traceback_grammar(decoder:LatticeFasterOnlineGrammarDecoder)

Computes current traceback.

get_delta_weights(num_frames_ready_in:int) → list<tuple<int, float>>

Gets the changes in frame weights.

Parameters:num_frames_ready_in (int) – Number of frames available at the input of the online iVector extractor.
Returns:Delta weights as list of (frame-index, delta-weight) tuples.
Return type:List[Tuple[int, float]]
class kaldi.online2.OnlineSilenceWeightingConfig

Configuration options for online silence weighting.

active() → bool

Returns true if list of silence phones is not empty and silence weight is not 1.0

max_state_duration

Maximum allowed duration of a single transition-id.

new_data_weight

Scale applied to data for which there is no decoder traceback yet.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
register_with_prefix(prefix:str, opts:OptionsItf)

Registers prefixed options with an object implementing the options interface.

Parameters:
  • prefix (str) – String that will be prepended to option names.
  • opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
silence_phones_str

Colon or comma separated list of integer ids for silence phones.

silence_weight

Weighing factor for silence frames.

class kaldi.online2.SingleUtteranceGmmDecoder

Online lattice-generating decoder for diagonal GMM models.

This class is used for decoding a single utterance in an online fashion using diagonal GMMs.

Parameters:
advance_decoding()

Advances the decoding until there are no more frames to decode.

This may also estimate fMLLR after advancing the decoding, depending on the configuration.

endpoint_detected(config:OnlineEndpointConfig) → bool

Determines if we should terminate decoding current utterance.

Parameters:config (OnlineEndpointConfig) – Online endpointing configuration.
Returns:True if an endpointing rule is active.
Return type:bool
estimate_fmllr(end_of_utterance:bool)

Estimates the [basis-]fMLLR transform and applies it to the features.

feature_pipeline() → OnlineFeaturePipeline

Returns the online feature pipeline.

final_relative_cost() → float

Returns the final realtive cost.

finalize_decoding()

Finalizes the decoding.

get_adaptation_state(adaptation_state:OnlineGmmAdaptationState)

Returns the adaptation state.

get_best_path(end_of_utterance:bool) → LatticeVectorFst

Gets best path as a lattice.

Parameters:end_of_utterance (bool) – If True and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:The best path.
Return type:LatticeVectorFst
Raises:RuntimeError – In the unusual circumstances where no tokens survive.
get_lattice(rescore_if_needed:bool, end_of_utterance:bool) → CompactLatticeVectorFst

Gets the lattice-determinized compact lattice.

The output is a deterministic compact lattice with a unique path for each word sequence.

Parameters:
  • rescore_if_needed (bool) – If this is True and there is any point in rescoring the state-level lattice, it will rescore the lattice.
  • end_of_utterance (bool) – If True and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:

The lattice-determinized compact lattice.

Return type:

CompactLatticeVectorFst

Raises:

RuntimeError – In the unusual circumstances where no tokens survive.

have_transform() → bool

Returns True if we already have a fMLLR transform.

class kaldi.online2.SingleUtteranceNnetDecoder

Online lattice-generating decoder for neural network models.

This class is used for decoding a single utterance in an online fashion using (nnet3) neural network models.

Parameters:
advance_decoding()

Advances decoding until there are no more frames to decode.

decoder() → LatticeFasterOnlineDecoder

Returns the underlying decoder object.

Note

The decoder object returned by this method is an instance of kaldi.decoder._lattice_faster_online_decoder.LatticeFasterOnlineDecoder, not an instance of kaldi.decoder.LatticeFasterOnlineDecoder. Hence, it does not support the additional decoder API implemented in Python.

endpoint_detected(config:OnlineEndpointConfig) → bool

Determines if we should terminate decoding current utterance.

Parameters:config (OnlineEndpointConfig) – Online endpointing configuration.
Returns:True if an endpointing rule is active.
Return type:bool
finalize_decoding()

Finalizes decoding.

This method may be optionally called after the last call to advance_decoding(). It does an extra pruning step to prune the lattices output by get_lattice() more accurately.

get_best_path(end_of_utterance:bool) → LatticeVectorFst

Gets best path as a lattice.

Parameters:end_of_utterance (bool) – If True and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:The best path.
Return type:LatticeVectorFst
Raises:RuntimeError – In the unusual circumstances where no tokens survive.
get_lattice(end_of_utterance:bool) → CompactLatticeVectorFst

Gets the lattice-determinized compact lattice.

The output is a deterministic compact lattice with a unique path for each word sequence.

Parameters:end_of_utterance (bool) – If True and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:The lattice-determinized compact lattice.
Return type:CompactLatticeVectorFst
Raises:RuntimeError – In the unusual circumstances where no tokens survive.
num_frames_decoded() → int

Queries the number of frames already decoded.

Returns:The number of frames already decoded.
Return type:int
class kaldi.online2.SingleUtteranceNnetGrammarDecoder

Online lattice-generating decoder for neural network models.

This class is used for decoding a single utterance in an online fashion using (nnet3) neural network models.

Parameters:
advance_decoding()

Advances decoding until there are no more frames to decode.

decoder() → LatticeFasterOnlineGrammarDecoder

Returns the underlying decoder object.

Note

The decoder object returned by this method is an instance of kaldi.decoder._lattice_faster_online_decoder_ext.LatticeFasterOnlineGrammarDecoder, not an instance of kaldi.decoder.LatticeFasterOnlineGrammarDecoder. Hence, it does not support the additional decoder API implemented in Python.

endpoint_detected(config:OnlineEndpointConfig) → bool

Determines if we should terminate decoding current utterance.

Parameters:config (OnlineEndpointConfig) – Online endpointing configuration.
Returns:True if an endpointing rule is active.
Return type:bool
finalize_decoding()

Finalizes decoding.

This method may be optionally called after the last call to advance_decoding(). It does an extra pruning step to prune the lattices output by get_lattice() more accurately.

get_best_path(end_of_utterance:bool) → LatticeVectorFst

Gets best path as a lattice.

Parameters:end_of_utterance (bool) – If True and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:The best path.
Return type:LatticeVectorFst
Raises:RuntimeError – In the unusual circumstances where no tokens survive.
get_lattice(end_of_utterance:bool) → CompactLatticeVectorFst

Gets the lattice-determinized compact lattice.

The output is a deterministic compact lattice with a unique path for each word sequence.

Parameters:end_of_utterance (bool) – If True and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:The lattice-determinized compact lattice.
Return type:CompactLatticeVectorFst
Raises:RuntimeError – In the unusual circumstances where no tokens survive.
num_frames_decoded() → int

Queries the number of frames already decoded.

Returns:The number of frames already decoded.
Return type:int
kaldi.online2.decoding_endpoint_detected(config:OnlineEndpointConfig, tmodel:TransitionModel, frame_shift_in_seconds:float, decoder:LatticeFasterOnlineDecoder) → bool

Determines if we should terminate decoding.

This is a higher-level convenience function that works out the arguments to the endpoint_detected() function.

Parameters:
Returns:

True if endpointing rules determines we should terminate decoding.

Return type:

bool

kaldi.online2.decoding_endpoint_detected_grammar(config:OnlineEndpointConfig, tmodel:TransitionModel, frame_shift_in_seconds:float, decoder:LatticeFasterOnlineGrammarDecoder) → bool

Determines if we should terminate decoding.

This is a higher-level convenience function that works out the arguments to the endpoint_detected() function.

Parameters:
Returns:

True if endpointing rules determines we should terminate decoding.

Return type:

bool

kaldi.online2.endpoint_detected(config:OnlineEndpointConfig, num_frames_decoded:int, trailing_silence_frames:int, frame_shift_in_seconds:float, final_relative_cost:float) → bool

Determines if any of the endpointing rules are active for given arguments.

Parameters:
  • config (OnlineEndpointConfig) – Online endpointing configuration.
  • num_frames_decoded (int) – Number of frames decoded.
  • trailing_silence_frames (int) – Number of trailing silence frames decoded.
  • frame_shift_in_seconds (float) – Frame shift (in seconds).
  • final_relative_cost (float) – Relative cost of final states.
Returns:

True if endpointing rules determines we should terminate decoding.

Return type:

bool

kaldi.online2.trailing_silence_length(tmodel:TransitionModel, silence_phones:str, decoder:LatticeFasterOnlineDecoder) → int

Returns the number of trailing silence frames on the best-path traceback.

Parameters:
kaldi.online2.trailing_silence_length_grammar(tmodel:TransitionModel, silence_phones:str, decoder:LatticeFasterOnlineGrammarDecoder) → int

Returns the number of trailing silence frames on the best-path traceback.

Parameters: