kaldi.online2¶
Online Endpointing¶
This module contains a simple facility for endpointing, that should be used in conjunction with the online decoding code. By endpointing in this context we mean “deciding when to stop decoding”, and not generic speech/silence segmentation. The use-case that we have in mind is some kind of dialog system where, as more speech data comes in, we decode more and more, and we have to decide when to stop decoding.
The endpointing rule is a disjunction of conjunctions. The way we have it configured, it’s an OR of five rules, and each rule has the following form:
(<contains-nonsilence> || !rule.must_contain_nonsilence)
&& <length-of-trailing-silence> >= rule.min_trailing_silence
&& <relative-cost> <= rule.max_relative_cost
&& <utterance-length> >= rule.min_utterance_length
where:
- <contains-nonsilence>
- is true if the best traceback contains any nonsilence phone;
- <length-of-trailing-silence>
- is the length in seconds of silence phones at the end of the best traceback (we stop counting when we hit non-silence),
- <relative-cost>
- is a value >= 0 extracted from the decoder, that is zero if a final-state of the grammar FST had the best cost at the final frame, and infinity if no final-state was active (and >0 for in-between cases).
- <utterance-length>
- is the number of seconds of the utterance that we have decoded so far.
All of these pieces of information are obtained from the best-path traceback
from the decoder, which is output by the function get_best_path()
. We do
this every time we’re finished processing a chunk of data.
For details of the default rules, see OnlineEndpointConfig
.
It’s up to the caller whether to use final-probs or not when generating the
best-path, i.e. decoder.get_best_path(use_final_probs=True|False)
, but we
recommend not using them. If you do use them, then depending on the grammar,
you may force the best-path to decode non-silence even though that was not what
it really preferred to decode.
Functions
decoding_endpoint_detected |
Determines if we should terminate decoding. |
decoding_endpoint_detected_grammar |
Determines if we should terminate decoding. |
endpoint_detected |
Determines if any of the endpointing rules are active for given arguments. |
trailing_silence_length |
Returns the number of trailing silence frames on the best-path traceback. |
trailing_silence_length_grammar |
Returns the number of trailing silence frames on the best-path traceback. |
Classes
DecodableDiagGmmScaledOnline |
Decodable for online decoding with diagonal GMMs. |
OnlineEndpointConfig |
Online endpointing configuration. |
OnlineEndpointRule |
Online endpointing rule. |
OnlineFeaturePipeline |
Online feature pipeline. |
OnlineFeaturePipelineCommandLineConfig |
Command-line configuration options for online feature pipeline. |
OnlineFeaturePipelineConfig |
Configuration options for online feature pipeline. |
OnlineGmmAdaptationState |
Online GMM adaptation state. |
OnlineGmmDecodingAdaptationPolicyConfig |
Configuration options for re-estimating basis-fMLLR during online decoding. |
OnlineGmmDecodingConfig |
Configuration options for online GMM decoding. |
OnlineGmmDecodingModels |
GMM models used for online decoding. |
OnlineIvectorExtractionConfig |
Command-line configuration options for online ivector extraction. |
OnlineIvectorExtractionInfo |
Configuration options for online iVector extraction. |
OnlineIvectorExtractorAdaptationState |
Adaptation state of the online ivector extractor. |
OnlineIvectorFeature |
Online ivector extractor. |
OnlineNnetFeaturePipeline |
Online feature pipeline for neural network decoding. |
OnlineNnetFeaturePipelineConfig |
Command-line configuration options for online neural network feature pipeline. |
OnlineNnetFeaturePipelineInfo |
Configuration options for online neural network feature pipeline. |
OnlineSilenceWeighting |
Online silence weighting. |
OnlineSilenceWeightingConfig |
Configuration options for online silence weighting. |
SingleUtteranceGmmDecoder |
Online lattice-generating decoder for diagonal GMM models. |
SingleUtteranceNnetDecoder |
Online lattice-generating decoder for neural network models. |
SingleUtteranceNnetGrammarDecoder |
Online lattice-generating decoder for neural network models. |
-
class
kaldi.online2.
DecodableDiagGmmScaledOnline
(am, trans_model, scale, input_feats)¶ Decodable for online decoding with diagonal GMMs.
Parameters: - am (AmDiagGmm) – Diagonal GMM.
- trans_model (TransitionModel) – Transition model.
- scale (float) – Acoustic scale.
- input_feats (OnlineFeatureInterface) – Online input features.
-
is_last_frame
(frame:int) → bool¶ Checks if given frame is the last frame.
-
log_likelihood
(frame:int, index:int) → float¶ Returns the log-likehood of the given index for the given frame.
-
num_frames_ready
() → int¶ Returns number of frames ready for decoding.
-
num_indices
() → int¶ Returns number of indices.
-
class
kaldi.online2.
OnlineEndpointConfig
¶ Online endpointing configuration.
Decoding is terminated if any of the endpointing rules evaluates to True.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
rule1
¶ Default Rule1 times out after 5 seconds of silence even if decoded nothing.
-
rule2
¶ Default Rule2 times out after 0.5 seconds of silence if reached final-state with good probability.
-
rule3
¶ Default Rule3 times out after 1.0 seconds of silence if reached final-state with OK probability.
-
rule4
¶ Default Rule4 times out after 2.0 seconds of silence after decoding something even if final-state was not reached.
-
rule5
¶ Default Rule5 times out after the utterance is 20.0 seconds.
-
silence_phones
¶ Colon separated list of phones to be considered as silence.
-
-
class
kaldi.online2.
OnlineEndpointRule
¶ Online endpointing rule.
Endpointing rule applies if all of the conditions are satisfied.
Parameters: - must_contain_nonsilence (bool) – If true, endpointing rule applies only if best-path traceback contains non-silence.
- min_trailing_silence (float) – Endpointing rule applies only if duration of trailing silence (in seconds) >= this value.
- max_relative_cost (float) – Endpointing rule applies only if relative-cost of final-states <= this value.
- min_utterance_length (float) – Endpointing rule applies only if utterance length (in seconds) >= this value.
-
max_relative_cost
¶ Endpointing rule applies only if relative-cost of final-states <= this value.
-
min_trailing_silence
¶ Endpointing rule applies only if duration of trailing silence (in seconds) >= this value.
-
min_utterance_length
¶ Endpointing rule applies only if utterance length (in seconds) >= this value.
-
must_contain_nonsilence
¶ If true, endpointing rule applies only if best-path traceback contains non-silence.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
register_with_prefix
(prefix:str, opts:OptionsItf)¶ Registers prefixed options with an object implementing the options interface.
Parameters: - prefix (str) – String that will be prepended to option names.
- opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
class
kaldi.online2.
OnlineFeaturePipeline
¶ Online feature pipeline.
This class is responsible for putting together the various stages of the feature-processing pipeline, in an online setting. This does not attempt to be fully generic, we just try to handle the common case. Since the online-decoding code needs to “know about” things like CMN and fMLLR in order to do adaptation, it’s hard to make this completely generic.
Parameters: config (OnlineFeaturePipelineConfig) – Configuration options for online feature pipeline. -
accept_waveform
(sampling_rate:float, waveform:VectorBase)¶ Accepts more data to process.
It won’t actually process the data, it will just copy it.
Parameters:
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
freeze_cmvn
()¶ Freezes CMVN.
- Throws:
- RuntimeError: If
num_frames_ready()
== 0.
-
get_cmvn_state
() → OnlineCmvnState¶ Returns the CMVN state.
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
have_fmllr_transform
() → bool¶ Returns True if an fMLLR transform has been set.
-
input_finished
()¶ Tells the class that you wont be providing any more waveform.
This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate).
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
new
() → OnlineFeaturePipeline¶ Returns a newly initialized copy.
This does not duplicate all the internal state or the speaker-adaptation state, but gives you a freshly initialized version of this object, as if you had initialized it using the constructor that takes the configuration object. After calling this you may want to call
set_cmvn_state()
andset_transform()
.
-
num_frames_ready
() → int¶ Returns number of frames ready
-
set_cmvn_state
(cmvn_state:OnlineCmvnState)¶ Sets the CMVN state.
-
set_transform
(transform:MatrixBase)¶ Sets the fMLLR transform.
Call it with an empty matrix if you want to stop it using any transform.
-
-
class
kaldi.online2.
OnlineFeaturePipelineCommandLineConfig
¶ Command-line configuration options for online feature pipeline.
This configuration class is to set up
OnlineFeaturePipelineConfig
, which in turn is the configuration class forOnlineFeaturePipeline
. Instead of taking the options for the parts of the feature pipeline directly, it reads in the configuration files for each part.-
add_deltas
¶ Append delta features (default=False)
-
add_pitch
¶ Append pitch features to raw MFCC/PLP features (default=False)
-
cmvn_config
¶ Configuration file for online CMVN features (e.g. conf/online_cmvn.conf)
-
delta_config
¶ Configuration file for delta features (e.g. conf/delta.conf)
If not supplied, will not compute delta features; supply empty config to use defaults.
-
fbank_config
¶ Configuration file for filterbank features (e.g. conf/fbank.conf)
-
feature_type
¶ Base feature type [mfcc (default), plp, fbank].
-
global_cmvn_stats_rxfilename
¶ **Extended filename for global CMVN stats (e.g. ‘ark* – matrix-sum scp* – data/train/cmvn.scp’)
-
lda_rxfilename
¶ Extended filename for LDA or LDA+MLLT matrix, if using LDA (e.g. exp/foo/final.mat)
-
mfcc_config
¶ Configuration file for MFCC features (e.g. conf/mfcc.conf)
-
pitch_config
¶ Configuration file for pitch features (e.g. conf/pitch.conf)
-
pitch_process_config
¶ Configuration file for post-processing pitch features (e.g. conf/pitch_process.conf)
-
plp_config
¶ Configuration file for PLP features (e.g. conf/plp.conf)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
splice_config
¶ Configuration file for feature splicing, if done (e.g. prior to LDA)
-
splice_feats
¶ Splice features with left and right context (default=False)
-
-
class
kaldi.online2.
OnlineFeaturePipelineConfig
¶ Configuration options for online feature pipeline.
This configuration class is responsible for storing the configuration options for
OnlineFeaturePipeline
. The options can be set either directly in code or indirectly by reading config files on disk viaOnlineFeaturePipelineCommandLineConfig
.-
add_deltas
¶ Append delta features (default=True)
-
add_pitch
¶ Append pitch features to raw MFCC/PLP features (default=False)
-
cmvn_opts
¶ Options for online CMVN features
-
delta_opts
¶ Options for delta features
-
fbank_opts
¶ Options for filterbank features
-
feature_type
¶ Base feature type [mfcc (default), plp, fbank]
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds.
-
from_config
(cmdline_config:OnlineFeaturePipelineCommandLineConfig) → OnlineFeaturePipelineConfig¶ Creates a new OnlineFeaturePipelineConfig from OnlineFeaturePipelineCommandLineConfig.
-
global_cmvn_stats_rxfilename
¶ **Extended filename for global CMVN stats (e.g. ‘ark* – matrix-sum scp* – data/train/cmvn.scp’)
-
lda_rxfilename
¶ Extended filename for LDA or LDA+MLLT matrix, if using LDA (e.g. exp/foo/final.mat)
-
mfcc_opts
¶ Options for MFCC features
-
pitch_opts
¶ Options for pitch features
-
pitch_process_opts
¶ Options for post-processing pitch features
-
plp_opts
¶ Options for PLP features
-
splice_feats
¶ Splice features with left and right context (default=False)
-
splice_opts
¶ Options for feature splicing, if done
-
-
class
kaldi.online2.
OnlineGmmAdaptationState
¶ Online GMM adaptation state.
-
cmvn_state
¶ Online CMVN state
-
read
(in_stream:istream, binary:bool)¶ Reads this object from input stream.
-
spk_stats
¶ Speaker transform stats.
-
transform
¶ Transform matrix
-
write
(out_stream:ostream, binary:bool)¶ Writes this object to output stream.
-
-
class
kaldi.online2.
OnlineGmmDecodingAdaptationPolicyConfig
¶ Configuration options for re-estimating basis-fMLLR during online decoding.
-
adaptation_delay
¶ Delay before first basis-fMLLR adaptation for not-first utterances of each speaker
-
adaptation_first_utt_delay
¶ Delay before first basis-fMLLR adaptation for first utterance of each speaker
-
adaptation_first_utt_ratio
¶ Ratio that controls frequency of fMLLR adaptation for first utterance of each speaker
-
adaptation_ratio
¶ Ratio that controls frequency of fMLLR adaptation for not-first utterances of each speaker
-
check
()¶ Checks if configuration is valid.
-
do_adapt
(chunk_begin_secs:float, chuck_end_secs:float, is_first_utterance:bool) → bool¶ Checks if we are scheduled to re-estimate fMLLR.
Parameters: Returns: True if we are scheduled to re-estimate fMLLR in the interval
[chunk_begin_secs, chuck_end_secs)
, False otherwise.Return type:
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
-
class
kaldi.online2.
OnlineGmmDecodingConfig
¶ Configuration options for online GMM decoding.
-
acoustic_scale
¶ Scaling factor acoustic likelihoods
-
adaptation_policy_opts
¶ Options for re-estimating basis-fMLLR during online decoding
-
basis_opts
¶ Options for basis-fMLLR adaptation
-
faster_decoder_opts
¶ Options for lattice-generating faster decoder.
-
fmllr_basis_rxfilename
¶ Extended filename for reading the basis elements for basis-fMLLR.
-
fmllr_lattice_beam
¶ Beam used in pruning lattices for fMLLR estimation
-
model_rxfilename
¶ Extended filename for reading the model used to estimate fMLLR transforms.
This is required.
-
online_alimdl_rxfilename
¶ Extended filename for reading the model trained with online-CMN features.
This is only needed if it is different from
model_rxfilename
.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
rescore_model_rxfilename
¶ Extended filename for reading the discriminatively trained model.
This is only needed if it is different from
model_rxfilename
.
-
silence_phones
¶ Colon-separated list of integer ids of silence phones
-
silence_weight
¶ Weight applied to silence frames for fMLLR estimation.
This has an effect only if
silence_phones
option is supplied.
-
-
class
kaldi.online2.
OnlineGmmDecodingModels
(config)¶ GMM models used for online decoding.
This class is used to read, store and give access to the models used for 3 phases of decoding (first-pass with online-CMN features; the ML models used for estimating transforms; and the discriminatively trained models). It takes care of the logic whereby if, say, the last model isn’t given we default to the second model, and so on, and it interpretes the filenames from the config object.
Parameters: config (OnlineGmmDecodingConfig) – Options for online GMM decoding. -
basis_fmllr_estimate
() → BasisFmllrEstimate¶ Returns the basis elements for basis-fMLLR.
-
get_final_model
() → AmDiagGmm¶ Returns the discriminatively trained model.
If supplied in the config, otherwise it returns the ML-trained model.
-
get_model
() → AmDiagGmm¶ Returns the ML-trained model used to get transforms.
-
get_online_alignment_model
() → AmDiagGmm¶ Returns the model trained with online-CMN features.
If supplied in the config, otherwise it returns the ML-trained model.
-
get_transition_model
() → TransitionModel¶ Returns the transition model.
-
-
class
kaldi.online2.
OnlineIvectorExtractionConfig
¶ Command-line configuration options for online ivector extraction.
This class includes configuration variables relating to the online ivector extraction, but not including configuration for the “base feature”, i.e. MFCC/PLP/filterbank, which is an input to this feature.
This configuration class is to set up
OnlineIvectorExtractionInfo
, which in turn is the configuration class forOnlineIvectorFeature
. Instead of taking the options for each part of the online ivector extractor directly, it reads in the configuration file for each part.-
cmvn_config_rxfilename
¶ Extended filename for reading the online CMVN configuration file.
-
diag_ubm_rxfilename
¶ Extended filename for reading the diagonal UBM used for obtaining the posteriors.
-
global_cmvn_stats_rxfilename
¶ Extended filename for reading the global CMVN stats.
-
greedy_ivector_extractor
¶ Whether to read ahead as much as we can when computing ivector stats (default=False).
-
ivector_extractor_rxfilename
¶ Extended filename for reading the ivector extractor.
-
ivector_period
¶ Online ivector period (default=10).
-
lda_mat_rxfilename
¶ Extended filename for reading the LDA matrix.
-
max_count
¶ If nonzero, maximum stats count we allow before scaling down stats (default=0.0).
-
max_remembered_frames
¶ Largest number of frames to remember between utterances of the same speaker (default=1000).
-
min_post
¶ Threshold for posterior pruning in ivector extraction (default=0.025).
-
num_cg_iters
¶ Number of iterations (default=15).
-
num_gselect
¶ Maximum number of posteriors to use per frame in ivector extraction (default=5).
-
posterior_scale
¶ Scale on posteriors used in ivector extraction (default=0.1).
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
splice_config_rxfilename
¶ Extended filename for reading the online frame splicing configuration file.
-
use_most_recent_ivector
¶ Whether to return the most recent ivector rather than the one for current frame (default=True).
-
-
class
kaldi.online2.
OnlineIvectorExtractionInfo
[source]¶ Configuration options for online iVector extraction.
-
check
()¶ Checks if configuration options are valid.
-
cmvn_opts
¶ Online CMVN options
-
diag_ubm
¶ Diagonal UBM
-
extractor
¶ Ivector extractor
-
from_config
(config:OnlineIvectorExtractionConfig) → _OnlineIvectorExtractionInfo¶ Creates a new OnlineIvectorExtractionInfo from a OnlineIvectorExtractionConfig.
-
global_cmvn_stats
¶
-
greedy_ivector_extractor
¶ Whether to read ahead as much as we can when computing ivector stats (default=False).
-
init
(config:OnlineIvectorExtractionConfig)¶ Initializes with the given config.
-
ivector_period
¶ Online ivector period (default=10).
-
lda_mat
¶
-
max_count
¶ If nonzero, maximum stats count we allow before scaling down stats (default=0.0).
-
max_remembered_frames
¶ Largest number of frames to remember between utterances of the same speaker (default=1000).
-
min_post
¶ Threshold for posterior pruning in ivector extraction (default=0.025).
-
num_cg_iters
¶ Number of iterations (default=15).
-
num_gselect
¶ Maximum number of posteriors to use per frame in ivector extraction (default=5).
-
posterior_scale
¶ Scale on posteriors used in ivector extraction (default=0.1).
-
splice_opts
¶ Online frame splicing options
-
use_most_recent_ivector
¶ Whether to return the most recent ivector rather than the one for current frame (default=True).
-
-
class
kaldi.online2.
OnlineIvectorExtractorAdaptationState
¶ Adaptation state of the online ivector extractor.
This class stores the adaptation state from the online ivector extractor, which can help you to initialize the adaptation state for the next utterance of the same speaker in a more informed way.
-
cmvn_state
¶ Online CMVN state (used for getting posteriors for ivector extraction)
-
from_info
(info:_OnlineIvectorExtractionInfo) → OnlineIvectorExtractorAdaptationState¶ Creates a new OnlineIvectorExtractorAdaptationState from OnlineIvectorExtractionInfo.
-
from_other
(other:OnlineIvectorExtractorAdaptationState) → OnlineIvectorExtractorAdaptationState¶ Creates a new OnlineIvectorExtractorAdaptationState from another.
-
ivector_stats
¶ Stats for online ivector estimation
-
limit_frames
(max_remembered_frames:float, posterior_scale:float)¶ Limits the frames.
Scales down the stats if needed to ensure the number of frames in the speaker-specific CMVN stats does not exceed max_remembered_frames.
-
read
(is:istream, binary:bool)¶ Reads this object from input stream.
-
write
(os:ostream, binary:bool)¶ Writes this object to output stream.
-
-
class
kaldi.online2.
OnlineIvectorFeature
¶ Online ivector extractor.
This class extracts online ivectors from raw features such as MFCC, PLP or filterbank.
Parameters: - info (OnlineIvectorExtractionInfo) – Options for online ivector extraction.
- base_feature (OnlineIvectorFeature) – Raw features MFCC, PLP or filterbank.
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_adaptation_state
(adaptation_state:OnlineIvectorExtractorAdaptationState)¶ Gets online iVector adaptation state.
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
num_frames
() → float¶ Returns number of frames seen
-
num_frames_ready
() → int¶ Returns number of frames ready
-
objf_impr_per_frame
() → float¶ Returns Objective improvement per frame from iVector estimation
-
set_adaptation_state
(adaptation_state:OnlineIvectorExtractorAdaptationState)¶ Sets online iVector adaptation state.
-
ubm_loglike_per_frame
() → float¶ Returns UBM log-like per frame
-
update_frame_weights
(delta_weights:list<tuple<int, float>>)¶ Updates frame weights.
-
class
kaldi.online2.
OnlineNnetFeaturePipeline
¶ Online feature pipeline for neural network decoding.
This is a different version of the online feature pipeline specialized for use in neural network decoding with iVectors. Our recipe is that we extract iVectors that will be used as an additional input to the neural network, in addition to a window of several frames of spliced raw features (MFCC, PLP or filterbanks). The iVectors are extracted on top of a (splice+LDA+MLLT) feature pipeline, with the added complication that the GMM posteriors used for the iVector extraction are obtained with a version of the features that has online cepstral mean (and optionally variance) normalization, whereas the stats for iVector are accumulated with a non-mean-normalized version of the features. The idea here is that we want the iVector to learn the mean offset, but we want the posteriors to be somewhat invariant to mean offsets.
Parameters: config (OnlineNnetFeaturePipelineInfo) – Configuration options for online neural network feature pipeline. -
accept_waveform
(sampling_rate:float, waveform:VectorBase)¶ Accepts more data to process.
It won’t actually process the data, it will just copy it.
Parameters:
-
dim
() → int¶ Returns feature dimension
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds
-
get_adaptation_state
(adaptation_state:OnlineIvectorExtractorAdaptationState)¶ Gets online iVector adaptation state.
-
get_frame
(frame:int, feat:VectorBase)¶ Returns the features for given frame index
-
get_frames
(frames:list<int>, feats:MatrixBase)¶ Returns the features for given frame indices
-
input_feature
() → OnlineFeatureInterface¶ Returns the part of the feature pipeline that would be given as the primary (non-iVector) input to neural network.
-
input_finished
()¶ Tells the class that you wont be providing any more waveform.
This will help flush out the last few frames of delta or LDA features, and finalize the pitch features (making them more accurate).
-
is_last_frame
(frame:int) → bool¶ Returns True if this is last frame, otherwise False
-
ivector_feature
() → OnlineIvectorFeature¶ Returns the ivector-extraction part of the feature pipeline (or None if iVectors are not being used).
-
num_frames_ready
() → int¶ Returns number of frames ready
-
set_adaptation_state
(adaptation_state:OnlineIvectorExtractorAdaptationState)¶ Sets online iVector adaptation state.
-
-
class
kaldi.online2.
OnlineNnetFeaturePipelineConfig
¶ Command-line configuration options for online neural network feature pipeline.
This configuration class is to set up
OnlineNnetFeaturePipelineInfo
, which in turn is the configuration class forOnlineNnetFeaturePipeline
. Instead of taking the options for the parts of the feature pipeline directly, it reads in the configuration files for each part.-
add_pitch
¶ Append pitch features to raw MFCC/PLP features (default=False)
-
fbank_config
¶ Configuration file for filterbank features (e.g. conf/fbank.conf)
-
feature_type
¶ Base feature type [mfcc (default), plp, fbank].
-
ivector_extraction_config
¶ Configuration file for online iVector extraction (e.g. conf/ivector.conf)
-
mfcc_config
¶ Configuration file for MFCC features (e.g. conf/mfcc.conf)
-
online_pitch_config
¶ Configuration file for online pitch features (e.g. conf/online_pitch.conf)
-
plp_config
¶ Configuration file for PLP features (e.g. conf/plp.conf)
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
silence_weighting_config
¶ Options for online silence weighting
-
-
class
kaldi.online2.
OnlineNnetFeaturePipelineInfo
¶ Configuration options for online neural network feature pipeline.
This configuration class is responsible for storing the configuration options for
OnlineNnetFeaturePipeline
(including the actual LDA and CMVN-stats matrices, and the iVector extractor). The options can be set either directly in code or indirectly by reading config files on disk viaOnlineNnetFeaturePipelineConfig
.-
add_pitch
¶ Append pitch features to raw MFCC/PLP features (default=False)
-
fbank_opts
¶ Options for filterbank features
-
feature_type
¶ Base feature type [mfcc (default), plp, fbank]
-
frame_shift_in_seconds
() → float¶ Returns frame shift in seconds.
-
from_config
(config:OnlineNnetFeaturePipelineConfig) → OnlineNnetFeaturePipelineInfo¶ Creates a new OnlineNnetFeaturePipelineInfo from OnlineNnetFeaturePipelineConfig.
-
ivector_dim
() → int¶ Returns iVector dimension.
-
ivector_extractor_info
¶ Options for online iVector extraction.
-
mfcc_opts
¶ Options for MFCC features
-
pitch_opts
¶ Options for pitch features
-
pitch_process_opts
¶ Options for post-processing pitch features
-
plp_opts
¶ Options for PLP features
-
silence_weighting_config
¶ Options for weighting silence in iVector adaptation.
-
use_ivectors
¶ Use iVectors as an extra input to the neural net
-
-
class
kaldi.online2.
OnlineSilenceWeighting
¶ Online silence weighting.
This class is responsible for keeping track of the best-path traceback from the decoder (efficiently) and computing a weighting of the data based on the classification of frames as silence (or not silence)… also with a duration limitation, so data from a very long run of the same transition-id will get weighted down. (this is often associated with misrecognition or silence).
Parameters: - trans_model (TransitionModel) – The transition model.
- config (OnlineSilenceWeightingConfig) – Options for online silence weighting.
- frame_subsampling_factor (int) – Frame subsampling factor (default=1).
-
active
() → bool¶ Returns true if list of silence phones is not empty and silence weight is not 1.0
-
compute_current_traceback
(decoder:LatticeFasterOnlineDecoder)¶ Computes current traceback.
-
compute_current_traceback_grammar
(decoder:LatticeFasterOnlineGrammarDecoder)¶ Computes current traceback.
-
get_delta_weights
(num_frames_ready_in:int) → list<tuple<int, float>>¶ Gets the changes in frame weights.
Parameters: num_frames_ready_in (int) – Number of frames available at the input of the online iVector extractor. Returns: Delta weights as list of (frame-index, delta-weight) tuples. Return type: List[Tuple[int, float]]
-
class
kaldi.online2.
OnlineSilenceWeightingConfig
¶ Configuration options for online silence weighting.
-
active
() → bool¶ Returns true if list of silence phones is not empty and silence weight is not 1.0
-
max_state_duration
¶ Maximum allowed duration of a single transition-id.
-
new_data_weight
¶ Scale applied to data for which there is no decoder traceback yet.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
register_with_prefix
(prefix:str, opts:OptionsItf)¶ Registers prefixed options with an object implementing the options interface.
Parameters: - prefix (str) – String that will be prepended to option names.
- opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
silence_phones_str
¶ Colon or comma separated list of integer ids for silence phones.
-
silence_weight
¶ Weighing factor for silence frames.
-
-
class
kaldi.online2.
SingleUtteranceGmmDecoder
¶ Online lattice-generating decoder for diagonal GMM models.
This class is used for decoding a single utterance in an online fashion using diagonal GMMs.
Parameters: - config (OnlineGmmDecodingConfig) – Options for online GMM decoding.
- models (OnlineGmmDecodingModels) – Models for online GMM decoding.
- feature_prototype (OnlineFeaturePipeline) – Online feature pipeline.
- fst (StdFst) – Decoding graph.
- adaptation_state (OnlineGmmAdaptationState) – Online GMM adaptation state.
-
advance_decoding
()¶ Advances the decoding until there are no more frames to decode.
This may also estimate fMLLR after advancing the decoding, depending on the configuration.
-
endpoint_detected
(config:OnlineEndpointConfig) → bool¶ Determines if we should terminate decoding current utterance.
Parameters: config ( OnlineEndpointConfig
) – Online endpointing configuration.Returns: True if an endpointing rule is active. Return type: bool
-
estimate_fmllr
(end_of_utterance:bool)¶ Estimates the [basis-]fMLLR transform and applies it to the features.
-
feature_pipeline
() → OnlineFeaturePipeline¶ Returns the online feature pipeline.
-
final_relative_cost
() → float¶ Returns the final realtive cost.
-
finalize_decoding
()¶ Finalizes the decoding.
-
get_adaptation_state
(adaptation_state:OnlineGmmAdaptationState)¶ Returns the adaptation state.
-
get_best_path
(end_of_utterance:bool) → LatticeVectorFst¶ Gets best path as a lattice.
Parameters: end_of_utterance (bool) – If True
and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.Returns: The best path. Return type: LatticeVectorFst Raises: RuntimeError
– In the unusual circumstances where no tokens survive.
-
get_lattice
(rescore_if_needed:bool, end_of_utterance:bool) → CompactLatticeVectorFst¶ Gets the lattice-determinized compact lattice.
The output is a deterministic compact lattice with a unique path for each word sequence.
Parameters: - rescore_if_needed (bool) – If this is True and there is any point in rescoring the state-level lattice, it will rescore the lattice.
- end_of_utterance (bool) – If
True
and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.
Returns: The lattice-determinized compact lattice.
Return type: Raises: RuntimeError
– In the unusual circumstances where no tokens survive.
-
have_transform
() → bool¶ Returns True if we already have a fMLLR transform.
-
class
kaldi.online2.
SingleUtteranceNnetDecoder
¶ Online lattice-generating decoder for neural network models.
This class is used for decoding a single utterance in an online fashion using (nnet3) neural network models.
Parameters: - decoder_opts (LatticeFasterDecoderOptions) – Configuration options for lattice-generating decoder.
- trans_model (TransitionModel) – Transition model.
- info (DecodableNnetSimpleLoopedInfo) – Static pre-computed information needed for nnet3 computation (including a reference to the model).
- fst (StdFst) – Decoding graph.
- features (OnlineNnetFeaturePipeline) – Online feature pipeline.
-
advance_decoding
()¶ Advances decoding until there are no more frames to decode.
-
decoder
() → LatticeFasterOnlineDecoder¶ Returns the underlying decoder object.
Note
The decoder object returned by this method is an instance of kaldi.decoder._lattice_faster_online_decoder.LatticeFasterOnlineDecoder, not an instance of kaldi.decoder.LatticeFasterOnlineDecoder. Hence, it does not support the additional decoder API implemented in Python.
-
endpoint_detected
(config:OnlineEndpointConfig) → bool¶ Determines if we should terminate decoding current utterance.
Parameters: config ( OnlineEndpointConfig
) – Online endpointing configuration.Returns: True if an endpointing rule is active. Return type: bool
-
finalize_decoding
()¶ Finalizes decoding.
This method may be optionally called after the last call to
advance_decoding()
. It does an extra pruning step to prune the lattices output byget_lattice()
more accurately.
-
get_best_path
(end_of_utterance:bool) → LatticeVectorFst¶ Gets best path as a lattice.
Parameters: end_of_utterance (bool) – If True
and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.Returns: The best path. Return type: LatticeVectorFst Raises: RuntimeError
– In the unusual circumstances where no tokens survive.
-
get_lattice
(end_of_utterance:bool) → CompactLatticeVectorFst¶ Gets the lattice-determinized compact lattice.
The output is a deterministic compact lattice with a unique path for each word sequence.
Parameters: end_of_utterance (bool) – If True
and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.Returns: The lattice-determinized compact lattice. Return type: CompactLatticeVectorFst Raises: RuntimeError
– In the unusual circumstances where no tokens survive.
-
class
kaldi.online2.
SingleUtteranceNnetGrammarDecoder
¶ Online lattice-generating decoder for neural network models.
This class is used for decoding a single utterance in an online fashion using (nnet3) neural network models.
Parameters: - decoder_opts (LatticeFasterDecoderOptions) – Configuration options for lattice-generating decoder.
- trans_model (TransitionModel) – Transition model.
- info (DecodableNnetSimpleLoopedInfo) – Static pre-computed information needed for nnet3 computation (including a reference to the model).
- fst (GrammarFst) – Decoding graph.
- features (OnlineNnetFeaturePipeline) – Online feature pipeline.
-
advance_decoding
()¶ Advances decoding until there are no more frames to decode.
-
decoder
() → LatticeFasterOnlineGrammarDecoder¶ Returns the underlying decoder object.
Note
The decoder object returned by this method is an instance of kaldi.decoder._lattice_faster_online_decoder_ext.LatticeFasterOnlineGrammarDecoder, not an instance of kaldi.decoder.LatticeFasterOnlineGrammarDecoder. Hence, it does not support the additional decoder API implemented in Python.
-
endpoint_detected
(config:OnlineEndpointConfig) → bool¶ Determines if we should terminate decoding current utterance.
Parameters: config ( OnlineEndpointConfig
) – Online endpointing configuration.Returns: True if an endpointing rule is active. Return type: bool
-
finalize_decoding
()¶ Finalizes decoding.
This method may be optionally called after the last call to
advance_decoding()
. It does an extra pruning step to prune the lattices output byget_lattice()
more accurately.
-
get_best_path
(end_of_utterance:bool) → LatticeVectorFst¶ Gets best path as a lattice.
Parameters: end_of_utterance (bool) – If True
and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.Returns: The best path. Return type: LatticeVectorFst Raises: RuntimeError
– In the unusual circumstances where no tokens survive.
-
get_lattice
(end_of_utterance:bool) → CompactLatticeVectorFst¶ Gets the lattice-determinized compact lattice.
The output is a deterministic compact lattice with a unique path for each word sequence.
Parameters: end_of_utterance (bool) – If True
and a final state of the graph is reached, then the output will include final probabilities given by the graph. Otherwise all final probabilities are treated as one.Returns: The lattice-determinized compact lattice. Return type: CompactLatticeVectorFst Raises: RuntimeError
– In the unusual circumstances where no tokens survive.
-
kaldi.online2.
decoding_endpoint_detected
(config:OnlineEndpointConfig, tmodel:TransitionModel, frame_shift_in_seconds:float, decoder:LatticeFasterOnlineDecoder) → bool¶ Determines if we should terminate decoding.
This is a higher-level convenience function that works out the arguments to the
endpoint_detected()
function.Parameters: - config (
OnlineEndpointConfig
) – Online endpointing configuration. - tmodel (TransitionModel) – Transition model.
- frame_shift_in_seconds (float) – Frame shift (in seconds).
- decoder (LatticeFasterOnlineDecoder) – Online lattice-generating decoder.
Returns: True if endpointing rules determines we should terminate decoding.
Return type: - config (
-
kaldi.online2.
decoding_endpoint_detected_grammar
(config:OnlineEndpointConfig, tmodel:TransitionModel, frame_shift_in_seconds:float, decoder:LatticeFasterOnlineGrammarDecoder) → bool¶ Determines if we should terminate decoding.
This is a higher-level convenience function that works out the arguments to the
endpoint_detected()
function.Parameters: - config (
OnlineEndpointConfig
) – Online endpointing configuration. - tmodel (TransitionModel) – Transition model.
- frame_shift_in_seconds (float) – Frame shift (in seconds).
- decoder (LatticeFasterOnlineGrammarDecoder) – Online lattice-generating decoder.
Returns: True if endpointing rules determines we should terminate decoding.
Return type: - config (
-
kaldi.online2.
endpoint_detected
(config:OnlineEndpointConfig, num_frames_decoded:int, trailing_silence_frames:int, frame_shift_in_seconds:float, final_relative_cost:float) → bool¶ Determines if any of the endpointing rules are active for given arguments.
Parameters: - config (
OnlineEndpointConfig
) – Online endpointing configuration. - num_frames_decoded (int) – Number of frames decoded.
- trailing_silence_frames (int) – Number of trailing silence frames decoded.
- frame_shift_in_seconds (float) – Frame shift (in seconds).
- final_relative_cost (float) – Relative cost of final states.
Returns: True if endpointing rules determines we should terminate decoding.
Return type: - config (
-
kaldi.online2.
trailing_silence_length
(tmodel:TransitionModel, silence_phones:str, decoder:LatticeFasterOnlineDecoder) → int¶ Returns the number of trailing silence frames on the best-path traceback.
Parameters: - tmodel (TransitionModel) – Transition model.
- silence_phones (str) – Colon-separated list of integer ids of silence phones.
- decoder (LatticeFasterOnlineDecoder) – Online decoder.
-
kaldi.online2.
trailing_silence_length_grammar
(tmodel:TransitionModel, silence_phones:str, decoder:LatticeFasterOnlineGrammarDecoder) → int¶ Returns the number of trailing silence frames on the best-path traceback.
Parameters: - tmodel (TransitionModel) – Transition model.
- silence_phones (str) – Colon-separated list of integer ids of silence phones.
- decoder (LatticeFasterOnlineGrammarDecoder) – Online grammar decoder.