kaldi.online2¶

Online Endpointing¶

This module contains a simple facility for endpointing, that should be used in conjunction with the online decoding code. By endpointing in this context we mean “deciding when to stop decoding”, and not generic speech/silence segmentation. The use-case that we have in mind is some kind of dialog system where, as more speech data comes in, we decode more and more, and we have to decide when to stop decoding.

The endpointing rule is a disjunction of conjunctions. The way we have it configured, it’s an OR of five rules, and each rule has the following form:

(<contains-nonsilence> || !rule.must_contain_nonsilence)
&& <length-of-trailing-silence> >= rule.min_trailing_silence
&& <relative-cost> <= rule.max_relative_cost
&& <utterance-length> >= rule.min_utterance_length

where:

<contains-nonsilence>: is true if the best traceback contains any nonsilence phone;
<length-of-trailing-silence>: is the length in seconds of silence phones at the end of the best traceback (we stop counting when we hit non-silence),
<relative-cost>: is a value >= 0 extracted from the decoder, that is zero if a final-state of the grammar FST had the best cost at the final frame, and infinity if no final-state was active (and >0 for in-between cases).
<utterance-length>: is the number of seconds of the utterance that we have decoded so far.

All of these pieces of information are obtained from the best-path traceback from the decoder, which is output by the function get_best_path(). We do this every time we’re finished processing a chunk of data.

For details of the default rules, see OnlineEndpointConfig.

It’s up to the caller whether to use final-probs or not when generating the best-path, i.e. decoder.get_best_path(use_final_probs=True|False), but we recommend not using them. If you do use them, then depending on the grammar, you may force the best-path to decode non-silence even though that was not what it really preferred to decode.

Functions

`decoding_endpoint_detected`	Determines if we should terminate decoding.
`decoding_endpoint_detected_grammar`	Determines if we should terminate decoding.
`endpoint_detected`	Determines if any of the endpointing rules are active for given arguments.
`trailing_silence_length`	Returns the number of trailing silence frames on the best-path traceback.
`trailing_silence_length_grammar`	Returns the number of trailing silence frames on the best-path traceback.

Classes

`DecodableDiagGmmScaledOnline`	Decodable for online decoding with diagonal GMMs.
`OnlineEndpointConfig`	Online endpointing configuration.
`OnlineEndpointRule`	Online endpointing rule.
`OnlineFeaturePipeline`	Online feature pipeline.
`OnlineFeaturePipelineCommandLineConfig`	Command-line configuration options for online feature pipeline.
`OnlineFeaturePipelineConfig`	Configuration options for online feature pipeline.
`OnlineGmmAdaptationState`	Online GMM adaptation state.
`OnlineGmmDecodingAdaptationPolicyConfig`	Configuration options for re-estimating basis-fMLLR during online decoding.
`OnlineGmmDecodingConfig`	Configuration options for online GMM decoding.
`OnlineGmmDecodingModels`	GMM models used for online decoding.
`OnlineIvectorExtractionConfig`	Command-line configuration options for online ivector extraction.
`OnlineIvectorExtractionInfo`	Configuration options for online iVector extraction.
`OnlineIvectorExtractorAdaptationState`	Adaptation state of the online ivector extractor.
`OnlineIvectorFeature`	Online ivector extractor.
`OnlineNnetFeaturePipeline`	Online feature pipeline for neural network decoding.
`OnlineNnetFeaturePipelineConfig`	Command-line configuration options for online neural network feature pipeline.
`OnlineNnetFeaturePipelineInfo`	Configuration options for online neural network feature pipeline.
`OnlineSilenceWeighting`	Online silence weighting.
`OnlineSilenceWeightingConfig`	Configuration options for online silence weighting.
`SingleUtteranceGmmDecoder`	Online lattice-generating decoder for diagonal GMM models.
`SingleUtteranceNnetDecoder`	Online lattice-generating decoder for neural network models.
`SingleUtteranceNnetGrammarDecoder`	Online lattice-generating decoder for neural network models.

class kaldi.online2.DecodableDiagGmmScaledOnline(am, trans_model, scale, input_feats)¶

Decodable for online decoding with diagonal GMMs.

Parameters:	am (AmDiagGmm) – Diagonal GMM. trans_model (TransitionModel) – Transition model. scale (float) – Acoustic scale. input_feats (OnlineFeatureInterface) – Online input features.

is_last_frame(frame:int) → bool¶: Checks if given frame is the last frame.

log_likelihood(frame:int, index:int) → float¶: Returns the log-likehood of the given index for the given frame.

num_frames_ready() → int¶: Returns number of frames ready for decoding.

num_indices() → int¶: Returns number of indices.

class kaldi.online2.OnlineEndpointConfig¶

Online endpointing configuration.

Decoding is terminated if any of the endpointing rules evaluates to True.

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

rule1¶: Default Rule1 times out after 5 seconds of silence even if decoded nothing.

rule2¶: Default Rule2 times out after 0.5 seconds of silence if reached final-state with good probability.

rule3¶: Default Rule3 times out after 1.0 seconds of silence if reached final-state with OK probability.

rule4¶: Default Rule4 times out after 2.0 seconds of silence after decoding something even if final-state was not reached.

rule5¶: Default Rule5 times out after the utterance is 20.0 seconds.

silence_phones¶: Colon separated list of phones to be considered as silence.

class kaldi.online2.OnlineEndpointRule¶

Online endpointing rule.

Endpointing rule applies if all of the conditions are satisfied.

Parameters:

must_contain_nonsilence (bool) – If true, endpointing rule applies only if best-path traceback contains non-silence.
min_trailing_silence (float) – Endpointing rule applies only if duration of trailing silence (in seconds) >= this value.
max_relative_cost (float) – Endpointing rule applies only if relative-cost of final-states <= this value.
min_utterance_length (float) – Endpointing rule applies only if utterance length (in seconds) >= this value.

max_relative_cost¶: Endpointing rule applies only if relative-cost of final-states <= this value.

min_trailing_silence¶: Endpointing rule applies only if duration of trailing silence (in seconds) >= this value.

min_utterance_length¶: Endpointing rule applies only if utterance length (in seconds) >= this value.

must_contain_nonsilence¶: If true, endpointing rule applies only if best-path traceback contains non-silence.

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

register_with_prefix(prefix:str, opts:OptionsItf)¶

Registers prefixed options with an object implementing the options interface.

Parameters:	prefix (str) – String that will be prepended to option names. opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

class kaldi.online2.OnlineFeaturePipeline¶

Online feature pipeline.

This class is responsible for putting together the various stages of the feature-processing pipeline, in an online setting. This does not attempt to be fully generic, we just try to handle the common case. Since the online-decoding code needs to “know about” things like CMN and fMLLR in order to do adaptation, it’s hard to make this completely generic.

Parameters:	config (OnlineFeaturePipelineConfig) – Configuration options for online feature pipeline.

accept_waveform(sampling_rate:float, waveform:VectorBase)¶

Accepts more data to process.

It won’t actually process the data, it will just copy it.

Parameters:	sampling_rate (float) – Sampling rate of the waveform. It is needed to assert that it matches the sampling rate given in the config. waveform (Vector) – More data to process.

dim() → int¶: Returns feature dimension

frame_shift_in_seconds() → float¶: Returns frame shift in seconds

freeze_cmvn()¶

Freezes CMVN.

Throws:: RuntimeError: If num_frames_ready() == 0.

get_cmvn_state() → OnlineCmvnState¶: Returns the CMVN state.

get_frame(frame:int, feat:VectorBase)¶: Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)¶: Returns the features for given frame indices

have_fmllr_transform() → bool¶: Returns True if an fMLLR transform has been set.

input_finished()¶

Tells the class that you wont be providing any more waveform.

This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate).

is_last_frame(frame:int) → bool¶: Returns True if this is last frame, otherwise False

new() → OnlineFeaturePipeline¶

Returns a newly initialized copy.

This does not duplicate all the internal state or the speaker-adaptation state, but gives you a freshly initialized version of this object, as if you had initialized it using the constructor that takes the configuration object. After calling this you may want to call set_cmvn_state() and set_transform().

num_frames_ready() → int¶: Returns number of frames ready

set_cmvn_state(cmvn_state:OnlineCmvnState)¶: Sets the CMVN state.

set_transform(transform:MatrixBase)¶

Sets the fMLLR transform.

Call it with an empty matrix if you want to stop it using any transform.

class kaldi.online2.OnlineFeaturePipelineCommandLineConfig¶

Command-line configuration options for online feature pipeline.

This configuration class is to set up OnlineFeaturePipelineConfig, which in turn is the configuration class for OnlineFeaturePipeline. Instead of taking the options for the parts of the feature pipeline directly, it reads in the configuration files for each part.

add_deltas¶: Append delta features (default=False)

add_pitch¶: Append pitch features to raw MFCC/PLP features (default=False)

cmvn_config¶: Configuration file for online CMVN features (e.g. conf/online_cmvn.conf)

delta_config¶

Configuration file for delta features (e.g. conf/delta.conf)

If not supplied, will not compute delta features; supply empty config to use defaults.

fbank_config¶: Configuration file for filterbank features (e.g. conf/fbank.conf)

feature_type¶: Base feature type [mfcc (default), plp, fbank].

global_cmvn_stats_rxfilename¶: **Extended filename for global CMVN stats (e.g. ‘ark* – matrix-sum scp* – data/train/cmvn.scp’)

lda_rxfilename¶: Extended filename for LDA or LDA+MLLT matrix, if using LDA (e.g. exp/foo/final.mat)

mfcc_config¶: Configuration file for MFCC features (e.g. conf/mfcc.conf)

pitch_config¶: Configuration file for pitch features (e.g. conf/pitch.conf)

pitch_process_config¶: Configuration file for post-processing pitch features (e.g. conf/pitch_process.conf)

plp_config¶: Configuration file for PLP features (e.g. conf/plp.conf)

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

splice_config¶: Configuration file for feature splicing, if done (e.g. prior to LDA)

splice_feats¶: Splice features with left and right context (default=False)

class kaldi.online2.OnlineFeaturePipelineConfig¶

Configuration options for online feature pipeline.

This configuration class is responsible for storing the configuration options for OnlineFeaturePipeline. The options can be set either directly in code or indirectly by reading config files on disk via OnlineFeaturePipelineCommandLineConfig.

add_deltas¶: Append delta features (default=True)

add_pitch¶: Append pitch features to raw MFCC/PLP features (default=False)

cmvn_opts¶: Options for online CMVN features

delta_opts¶: Options for delta features

fbank_opts¶: Options for filterbank features

feature_type¶: Base feature type [mfcc (default), plp, fbank]

frame_shift_in_seconds() → float¶: Returns frame shift in seconds.

from_config(cmdline_config:OnlineFeaturePipelineCommandLineConfig) → OnlineFeaturePipelineConfig¶: Creates a new OnlineFeaturePipelineConfig from OnlineFeaturePipelineCommandLineConfig.

global_cmvn_stats_rxfilename¶: **Extended filename for global CMVN stats (e.g. ‘ark* – matrix-sum scp* – data/train/cmvn.scp’)

lda_rxfilename¶: Extended filename for LDA or LDA+MLLT matrix, if using LDA (e.g. exp/foo/final.mat)

mfcc_opts¶: Options for MFCC features

pitch_opts¶: Options for pitch features

pitch_process_opts¶: Options for post-processing pitch features

plp_opts¶: Options for PLP features

splice_feats¶: Splice features with left and right context (default=False)

splice_opts¶: Options for feature splicing, if done

class kaldi.online2.OnlineGmmAdaptationState¶

Online GMM adaptation state.

cmvn_state¶: Online CMVN state

read(in_stream:istream, binary:bool)¶: Reads this object from input stream.

spk_stats¶: Speaker transform stats.

transform¶: Transform matrix

write(out_stream:ostream, binary:bool)¶: Writes this object to output stream.

class kaldi.online2.OnlineGmmDecodingAdaptationPolicyConfig¶

Configuration options for re-estimating basis-fMLLR during online decoding.

adaptation_delay¶: Delay before first basis-fMLLR adaptation for not-first utterances of each speaker

adaptation_first_utt_delay¶: Delay before first basis-fMLLR adaptation for first utterance of each speaker

adaptation_first_utt_ratio¶: Ratio that controls frequency of fMLLR adaptation for first utterance of each speaker

adaptation_ratio¶: Ratio that controls frequency of fMLLR adaptation for not-first utterances of each speaker

check()¶: Checks if configuration is valid.

do_adapt(chunk_begin_secs:float, chuck_end_secs:float, is_first_utterance:bool) → bool¶

Checks if we are scheduled to re-estimate fMLLR.

Parameters:	chunk_begin_secs (float) – Chunk begin time in seconds. chuck_end_secs (float) – Chunk end time in seconds. is_first_utterance (bool) – First utterance or not.
Returns:	True if we are scheduled to re-estimate fMLLR in the interval `[chunk_begin_secs, chuck_end_secs)`, False otherwise.
Return type:	bool

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

class kaldi.online2.OnlineGmmDecodingConfig¶

Configuration options for online GMM decoding.

acoustic_scale¶: Scaling factor acoustic likelihoods

adaptation_policy_opts¶: Options for re-estimating basis-fMLLR during online decoding

basis_opts¶: Options for basis-fMLLR adaptation

faster_decoder_opts¶: Options for lattice-generating faster decoder.

fmllr_basis_rxfilename¶: Extended filename for reading the basis elements for basis-fMLLR.

fmllr_lattice_beam¶: Beam used in pruning lattices for fMLLR estimation

model_rxfilename¶

Extended filename for reading the model used to estimate fMLLR transforms.

This is required.

online_alimdl_rxfilename¶

Extended filename for reading the model trained with online-CMN features.

This is only needed if it is different from model_rxfilename.

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

rescore_model_rxfilename¶

Extended filename for reading the discriminatively trained model.

This is only needed if it is different from model_rxfilename.

silence_phones¶: Colon-separated list of integer ids of silence phones

silence_weight¶

Weight applied to silence frames for fMLLR estimation.

This has an effect only if silence_phones option is supplied.

class kaldi.online2.OnlineGmmDecodingModels(config)¶

GMM models used for online decoding.

This class is used to read, store and give access to the models used for 3 phases of decoding (first-pass with online-CMN features; the ML models used for estimating transforms; and the discriminatively trained models). It takes care of the logic whereby if, say, the last model isn’t given we default to the second model, and so on, and it interpretes the filenames from the config object.

Parameters:	config (OnlineGmmDecodingConfig) – Options for online GMM decoding.

basis_fmllr_estimate() → BasisFmllrEstimate¶: Returns the basis elements for basis-fMLLR.

get_final_model() → AmDiagGmm¶

Returns the discriminatively trained model.

If supplied in the config, otherwise it returns the ML-trained model.

get_model() → AmDiagGmm¶: Returns the ML-trained model used to get transforms.

get_online_alignment_model() → AmDiagGmm¶

Returns the model trained with online-CMN features.

If supplied in the config, otherwise it returns the ML-trained model.

get_transition_model() → TransitionModel¶: Returns the transition model.

class kaldi.online2.OnlineIvectorExtractionConfig¶

Command-line configuration options for online ivector extraction.

This class includes configuration variables relating to the online ivector extraction, but not including configuration for the “base feature”, i.e. MFCC/PLP/filterbank, which is an input to this feature.

This configuration class is to set up OnlineIvectorExtractionInfo, which in turn is the configuration class for OnlineIvectorFeature. Instead of taking the options for each part of the online ivector extractor directly, it reads in the configuration file for each part.

cmvn_config_rxfilename¶: Extended filename for reading the online CMVN configuration file.

diag_ubm_rxfilename¶: Extended filename for reading the diagonal UBM used for obtaining the posteriors.

global_cmvn_stats_rxfilename¶: Extended filename for reading the global CMVN stats.

greedy_ivector_extractor¶: Whether to read ahead as much as we can when computing ivector stats (default=False).

ivector_extractor_rxfilename¶: Extended filename for reading the ivector extractor.

ivector_period¶: Online ivector period (default=10).

lda_mat_rxfilename¶: Extended filename for reading the LDA matrix.

max_count¶: If nonzero, maximum stats count we allow before scaling down stats (default=0.0).

max_remembered_frames¶: Largest number of frames to remember between utterances of the same speaker (default=1000).

min_post¶: Threshold for posterior pruning in ivector extraction (default=0.025).

num_cg_iters¶: Number of iterations (default=15).

num_gselect¶: Maximum number of posteriors to use per frame in ivector extraction (default=5).

posterior_scale¶: Scale on posteriors used in ivector extraction (default=0.1).

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

splice_config_rxfilename¶: Extended filename for reading the online frame splicing configuration file.

use_most_recent_ivector¶: Whether to return the most recent ivector rather than the one for current frame (default=True).

class kaldi.online2.OnlineIvectorExtractionInfo[source]¶

Configuration options for online iVector extraction.

check()¶: Checks if configuration options are valid.

cmvn_opts¶: Online CMVN options

diag_ubm¶: Diagonal UBM

extractor¶: Ivector extractor

from_config(config:OnlineIvectorExtractionConfig) → _OnlineIvectorExtractionInfo¶: Creates a new OnlineIvectorExtractionInfo from a OnlineIvectorExtractionConfig.

global_cmvn_stats¶

greedy_ivector_extractor¶: Whether to read ahead as much as we can when computing ivector stats (default=False).

init(config:OnlineIvectorExtractionConfig)¶: Initializes with the given config.

ivector_period¶: Online ivector period (default=10).

lda_mat¶

max_count¶: If nonzero, maximum stats count we allow before scaling down stats (default=0.0).

max_remembered_frames¶: Largest number of frames to remember between utterances of the same speaker (default=1000).

min_post¶: Threshold for posterior pruning in ivector extraction (default=0.025).

num_cg_iters¶: Number of iterations (default=15).

num_gselect¶: Maximum number of posteriors to use per frame in ivector extraction (default=5).

posterior_scale¶: Scale on posteriors used in ivector extraction (default=0.1).

splice_opts¶: Online frame splicing options

use_most_recent_ivector¶: Whether to return the most recent ivector rather than the one for current frame (default=True).

class kaldi.online2.OnlineIvectorExtractorAdaptationState¶

Adaptation state of the online ivector extractor.

This class stores the adaptation state from the online ivector extractor, which can help you to initialize the adaptation state for the next utterance of the same speaker in a more informed way.

cmvn_state¶: Online CMVN state (used for getting posteriors for ivector extraction)

from_info(info:_OnlineIvectorExtractionInfo) → OnlineIvectorExtractorAdaptationState¶: Creates a new OnlineIvectorExtractorAdaptationState from OnlineIvectorExtractionInfo.

from_other(other:OnlineIvectorExtractorAdaptationState) → OnlineIvectorExtractorAdaptationState¶: Creates a new OnlineIvectorExtractorAdaptationState from another.

ivector_stats¶: Stats for online ivector estimation

limit_frames(max_remembered_frames:float, posterior_scale:float)¶

Limits the frames.

Scales down the stats if needed to ensure the number of frames in the speaker-specific CMVN stats does not exceed max_remembered_frames.

read(is:istream, binary:bool)¶: Reads this object from input stream.

write(os:ostream, binary:bool)¶: Writes this object to output stream.

class kaldi.online2.OnlineIvectorFeature¶

Online ivector extractor.

This class extracts online ivectors from raw features such as MFCC, PLP or filterbank.

Parameters:	info (OnlineIvectorExtractionInfo) – Options for online ivector extraction. base_feature (OnlineIvectorFeature) – Raw features MFCC, PLP or filterbank.

dim() → int¶: Returns feature dimension

frame_shift_in_seconds() → float¶: Returns frame shift in seconds

get_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)¶: Gets online iVector adaptation state.

get_frame(frame:int, feat:VectorBase)¶: Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)¶: Returns the features for given frame indices

is_last_frame(frame:int) → bool¶: Returns True if this is last frame, otherwise False

num_frames() → float¶: Returns number of frames seen

num_frames_ready() → int¶: Returns number of frames ready

objf_impr_per_frame() → float¶: Returns Objective improvement per frame from iVector estimation

set_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)¶: Sets online iVector adaptation state.

ubm_loglike_per_frame() → float¶: Returns UBM log-like per frame

update_frame_weights(delta_weights:list<tuple<int, float>>)¶: Updates frame weights.

class kaldi.online2.OnlineNnetFeaturePipeline¶

Online feature pipeline for neural network decoding.

This is a different version of the online feature pipeline specialized for use in neural network decoding with iVectors. Our recipe is that we extract iVectors that will be used as an additional input to the neural network, in addition to a window of several frames of spliced raw features (MFCC, PLP or filterbanks). The iVectors are extracted on top of a (splice+LDA+MLLT) feature pipeline, with the added complication that the GMM posteriors used for the iVector extraction are obtained with a version of the features that has online cepstral mean (and optionally variance) normalization, whereas the stats for iVector are accumulated with a non-mean-normalized version of the features. The idea here is that we want the iVector to learn the mean offset, but we want the posteriors to be somewhat invariant to mean offsets.

Parameters:	config (OnlineNnetFeaturePipelineInfo) – Configuration options for online neural network feature pipeline.

accept_waveform(sampling_rate:float, waveform:VectorBase)¶

Accepts more data to process.

It won’t actually process the data, it will just copy it.

Parameters:	sampling_rate (float) – Sampling rate of the waveform. It is needed to assert that it matches the sampling rate given in the config. waveform (Vector) – More data to process.

dim() → int¶: Returns feature dimension

frame_shift_in_seconds() → float¶: Returns frame shift in seconds

get_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)¶: Gets online iVector adaptation state.

get_frame(frame:int, feat:VectorBase)¶: Returns the features for given frame index

get_frames(frames:list<int>, feats:MatrixBase)¶: Returns the features for given frame indices

input_feature() → OnlineFeatureInterface¶: Returns the part of the feature pipeline that would be given as the primary (non-iVector) input to neural network.

input_finished()¶

Tells the class that you wont be providing any more waveform.

This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate).

is_last_frame(frame:int) → bool¶: Returns True if this is last frame, otherwise False

ivector_feature() → OnlineIvectorFeature¶: Returns the ivector-extraction part of the feature pipeline (or None if iVectors are not being used).

num_frames_ready() → int¶: Returns number of frames ready

set_adaptation_state(adaptation_state:OnlineIvectorExtractorAdaptationState)¶: Sets online iVector adaptation state.

class kaldi.online2.OnlineNnetFeaturePipelineConfig¶

Command-line configuration options for online neural network feature pipeline.

This configuration class is to set up OnlineNnetFeaturePipelineInfo, which in turn is the configuration class for OnlineNnetFeaturePipeline. Instead of taking the options for the parts of the feature pipeline directly, it reads in the configuration files for each part.

add_pitch¶: Append pitch features to raw MFCC/PLP features (default=False)

fbank_config¶: Configuration file for filterbank features (e.g. conf/fbank.conf)

feature_type¶: Base feature type [mfcc (default), plp, fbank].

ivector_extraction_config¶: Configuration file for online iVector extraction (e.g. conf/ivector.conf)

mfcc_config¶: Configuration file for MFCC features (e.g. conf/mfcc.conf)

online_pitch_config¶: Configuration file for online pitch features (e.g. conf/online_pitch.conf)

plp_config¶: Configuration file for PLP features (e.g. conf/plp.conf)

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

silence_weighting_config¶: Options for online silence weighting

class kaldi.online2.OnlineNnetFeaturePipelineInfo¶

Configuration options for online neural network feature pipeline.

This configuration class is responsible for storing the configuration options for OnlineNnetFeaturePipeline (including the actual LDA and CMVN-stats matrices, and the iVector extractor). The options can be set either directly in code or indirectly by reading config files on disk via OnlineNnetFeaturePipelineConfig.

add_pitch¶: Append pitch features to raw MFCC/PLP features (default=False)

fbank_opts¶: Options for filterbank features

feature_type¶: Base feature type [mfcc (default), plp, fbank]

frame_shift_in_seconds() → float¶: Returns frame shift in seconds.

from_config(config:OnlineNnetFeaturePipelineConfig) → OnlineNnetFeaturePipelineInfo¶: Creates a new OnlineNnetFeaturePipelineInfo from OnlineNnetFeaturePipelineConfig.

ivector_dim() → int¶: Returns iVector dimension.

ivector_extractor_info¶: Options for online iVector extraction.

mfcc_opts¶: Options for MFCC features

pitch_opts¶: Options for pitch features

pitch_process_opts¶: Options for post-processing pitch features

plp_opts¶: Options for PLP features

silence_weighting_config¶: Options for weighting silence in iVector adaptation.

use_ivectors¶: Use iVectors as an extra input to the neural net

class kaldi.online2.OnlineSilenceWeighting¶

Online silence weighting.

This class is responsible for keeping track of the best-path traceback from the decoder (efficiently) and computing a weighting of the data based on the classification of frames as silence (or not silence)… also with a duration limitation, so data from a very long run of the same transition-id will get weighted down. (this is often associated with misrecognition or silence).

Parameters:	trans_model (TransitionModel) – The transition model. config (OnlineSilenceWeightingConfig) – Options for online silence weighting. frame_subsampling_factor (int) – Frame subsampling factor (default=1).

active() → bool¶: Returns true if list of silence phones is not empty and silence weight is not 1.0

compute_current_traceback(decoder:LatticeFasterOnlineDecoder)¶: Computes current traceback.

compute_current_traceback_grammar(decoder:LatticeFasterOnlineGrammarDecoder)¶: Computes current traceback.

get_delta_weights(num_frames_ready_in:int) → list<tuple<int, float>>¶

Gets the changes in frame weights.

Parameters:	num_frames_ready_in (int) – Number of frames available at the input of the online iVector extractor.
Returns:	Delta weights as list of (frame-index, delta-weight) tuples.
Return type:	List[Tuple[int, float]]

class kaldi.online2.OnlineSilenceWeightingConfig¶

Configuration options for online silence weighting.

active() → bool¶: Returns true if list of silence phones is not empty and silence weight is not 1.0

max_state_duration¶: Maximum allowed duration of a single transition-id.

new_data_weight¶: Scale applied to data for which there is no decoder traceback yet.

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

register_with_prefix(prefix:str, opts:OptionsItf)¶

Registers prefixed options with an object implementing the options interface.

Parameters:	prefix (str) – String that will be prepended to option names. opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

silence_phones_str¶: Colon or comma separated list of integer ids for silence phones.

silence_weight¶: Weighing factor for silence frames.

class kaldi.online2.SingleUtteranceGmmDecoder¶

Online lattice-generating decoder for diagonal GMM models.

This class is used for decoding a single utterance in an online fashion using diagonal GMMs.

Parameters:	config (OnlineGmmDecodingConfig) – Options for online GMM decoding. models (OnlineGmmDecodingModels) – Models for online GMM decoding. feature_prototype (OnlineFeaturePipeline) – Online feature pipeline. fst (StdFst) – Decoding graph. adaptation_state (OnlineGmmAdaptationState) – Online GMM adaptation state.

advance_decoding()¶

Advances the decoding until there are no more frames to decode.

This may also estimate fMLLR after advancing the decoding, depending on the configuration.

endpoint_detected(config:OnlineEndpointConfig) → bool¶

Determines if we should terminate decoding current utterance.

Parameters:	config (`OnlineEndpointConfig`) – Online endpointing configuration.
Returns:	True if an endpointing rule is active.
Return type:	bool

estimate_fmllr(end_of_utterance:bool)¶: Estimates the [basis-]fMLLR transform and applies it to the features.

feature_pipeline() → OnlineFeaturePipeline¶: Returns the online feature pipeline.

final_relative_cost() → float¶: Returns the final realtive cost.

finalize_decoding()¶: Finalizes the decoding.

get_adaptation_state(adaptation_state:OnlineGmmAdaptationState)¶: Returns the adaptation state.

get_best_path(end_of_utterance:bool) → LatticeVectorFst¶

Gets best path as a lattice.

Parameters:	end_of_utterance (bool) – If `True` and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:	The best path.
Return type:	LatticeVectorFst
Raises:	`RuntimeError` – In the unusual circumstances where no tokens survive.

get_lattice(rescore_if_needed:bool, end_of_utterance:bool) → CompactLatticeVectorFst¶

Gets the lattice-determinized compact lattice.

The output is a deterministic compact lattice with a unique path for each word sequence.

Parameters:	rescore_if_needed (bool) – If this is True and there is any point in rescoring the state-level lattice, it will rescore the lattice. end_of_utterance (bool) – If `True` and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:	The lattice-determinized compact lattice.
Return type:	CompactLatticeVectorFst
Raises:	`RuntimeError` – In the unusual circumstances where no tokens survive.

have_transform() → bool¶: Returns True if we already have a fMLLR transform.

class kaldi.online2.SingleUtteranceNnetDecoder¶

Online lattice-generating decoder for neural network models.

This class is used for decoding a single utterance in an online fashion using (nnet3) neural network models.

Parameters:

decoder_opts (LatticeFasterDecoderOptions) – Configuration options for lattice-generating decoder.
trans_model (TransitionModel) – Transition model.
info (DecodableNnetSimpleLoopedInfo) – Static pre-computed information needed for nnet3 computation (including a reference to the model).
fst (StdFst) – Decoding graph.
features (OnlineNnetFeaturePipeline) – Online feature pipeline.

advance_decoding()¶: Advances decoding until there are no more frames to decode.

decoder() → LatticeFasterOnlineDecoder¶: Returns the underlying decoder object.

Note

The decoder object returned by this method is an instance of kaldi.decoder._lattice_faster_online_decoder.LatticeFasterOnlineDecoder, not an instance of kaldi.decoder.LatticeFasterOnlineDecoder. Hence, it does not support the additional decoder API implemented in Python.

endpoint_detected(config:OnlineEndpointConfig) → bool¶

Determines if we should terminate decoding current utterance.

Parameters:	config (`OnlineEndpointConfig`) – Online endpointing configuration.
Returns:	True if an endpointing rule is active.
Return type:	bool

finalize_decoding()¶

Finalizes decoding.

This method may be optionally called after the last call to advance_decoding(). It does an extra pruning step to prune the lattices output by get_lattice() more accurately.

get_best_path(end_of_utterance:bool) → LatticeVectorFst¶

Gets best path as a lattice.

Parameters:	end_of_utterance (bool) – If `True` and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:	The best path.
Return type:	LatticeVectorFst
Raises:	`RuntimeError` – In the unusual circumstances where no tokens survive.

get_lattice(end_of_utterance:bool) → CompactLatticeVectorFst¶

Gets the lattice-determinized compact lattice.

The output is a deterministic compact lattice with a unique path for each word sequence.

Parameters:	end_of_utterance (bool) – If `True` and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:	The lattice-determinized compact lattice.
Return type:	CompactLatticeVectorFst
Raises:	`RuntimeError` – In the unusual circumstances where no tokens survive.

num_frames_decoded() → int¶

Queries the number of frames already decoded.

Returns:	The number of frames already decoded.
Return type:	int

class kaldi.online2.SingleUtteranceNnetGrammarDecoder¶

Online lattice-generating decoder for neural network models.

This class is used for decoding a single utterance in an online fashion using (nnet3) neural network models.

Parameters:

decoder_opts (LatticeFasterDecoderOptions) – Configuration options for lattice-generating decoder.
trans_model (TransitionModel) – Transition model.
info (DecodableNnetSimpleLoopedInfo) – Static pre-computed information needed for nnet3 computation (including a reference to the model).
fst (GrammarFst) – Decoding graph.
features (OnlineNnetFeaturePipeline) – Online feature pipeline.

advance_decoding()¶: Advances decoding until there are no more frames to decode.

decoder() → LatticeFasterOnlineGrammarDecoder¶: Returns the underlying decoder object.

Note

The decoder object returned by this method is an instance of kaldi.decoder._lattice_faster_online_decoder_ext.LatticeFasterOnlineGrammarDecoder, not an instance of kaldi.decoder.LatticeFasterOnlineGrammarDecoder. Hence, it does not support the additional decoder API implemented in Python.

endpoint_detected(config:OnlineEndpointConfig) → bool¶

Determines if we should terminate decoding current utterance.

Parameters:	config (`OnlineEndpointConfig`) – Online endpointing configuration.
Returns:	True if an endpointing rule is active.
Return type:	bool

finalize_decoding()¶

Finalizes decoding.

This method may be optionally called after the last call to advance_decoding(). It does an extra pruning step to prune the lattices output by get_lattice() more accurately.

get_best_path(end_of_utterance:bool) → LatticeVectorFst¶

Gets best path as a lattice.

Parameters:	end_of_utterance (bool) – If `True` and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:	The best path.
Return type:	LatticeVectorFst
Raises:	`RuntimeError` – In the unusual circumstances where no tokens survive.

get_lattice(end_of_utterance:bool) → CompactLatticeVectorFst¶

Gets the lattice-determinized compact lattice.

The output is a deterministic compact lattice with a unique path for each word sequence.

Parameters:	end_of_utterance (bool) – If `True` and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns:	The lattice-determinized compact lattice.
Return type:	CompactLatticeVectorFst
Raises:	`RuntimeError` – In the unusual circumstances where no tokens survive.

num_frames_decoded() → int¶

Queries the number of frames already decoded.

Returns:	The number of frames already decoded.
Return type:	int

kaldi.online2.decoding_endpoint_detected(config:OnlineEndpointConfig, tmodel:TransitionModel, frame_shift_in_seconds:float, decoder:LatticeFasterOnlineDecoder) → bool¶

Determines if we should terminate decoding.

This is a higher-level convenience function that works out the arguments to the endpoint_detected() function.

Parameters:	config (`OnlineEndpointConfig`) – Online endpointing configuration. tmodel (TransitionModel) – Transition model. frame_shift_in_seconds (float) – Frame shift (in seconds). decoder (LatticeFasterOnlineDecoder) – Online lattice-generating decoder.
Returns:	True if endpointing rules determines we should terminate decoding.
Return type:	bool

kaldi.online2.decoding_endpoint_detected_grammar(config:OnlineEndpointConfig, tmodel:TransitionModel, frame_shift_in_seconds:float, decoder:LatticeFasterOnlineGrammarDecoder) → bool¶

Determines if we should terminate decoding.

This is a higher-level convenience function that works out the arguments to the endpoint_detected() function.

Parameters:	config (`OnlineEndpointConfig`) – Online endpointing configuration. tmodel (TransitionModel) – Transition model. frame_shift_in_seconds (float) – Frame shift (in seconds). decoder (LatticeFasterOnlineGrammarDecoder) – Online lattice-generating decoder.
Returns:	True if endpointing rules determines we should terminate decoding.
Return type:	bool

kaldi.online2.endpoint_detected(config:OnlineEndpointConfig, num_frames_decoded:int, trailing_silence_frames:int, frame_shift_in_seconds:float, final_relative_cost:float) → bool¶

Determines if any of the endpointing rules are active for given arguments.

Parameters:	config (`OnlineEndpointConfig`) – Online endpointing configuration. num_frames_decoded (int) – Number of frames decoded. trailing_silence_frames (int) – Number of trailing silence frames decoded. frame_shift_in_seconds (float) – Frame shift (in seconds). final_relative_cost (float) – Relative cost of final states.
Returns:	True if endpointing rules determines we should terminate decoding.
Return type:	bool

kaldi.online2.trailing_silence_length(tmodel:TransitionModel, silence_phones:str, decoder:LatticeFasterOnlineDecoder) → int¶

Returns the number of trailing silence frames on the best-path traceback.

Parameters:	tmodel (TransitionModel) – Transition model. silence_phones (str) – Colon-separated list of integer ids of silence phones. decoder (LatticeFasterOnlineDecoder) – Online decoder.

kaldi.online2.trailing_silence_length_grammar(tmodel:TransitionModel, silence_phones:str, decoder:LatticeFasterOnlineGrammarDecoder) → int¶

Returns the number of trailing silence frames on the best-path traceback.

Parameters:	tmodel (TransitionModel) – Transition model. silence_phones (str) – Colon-separated list of integer ids of silence phones. decoder (LatticeFasterOnlineGrammarDecoder) – Online grammar decoder.