kaldi.asr¶
This module provides a number of speech recognizers with an easy to use API.
Note that in Kaldi, therefore in PyKaldi, there is no single “canonical” decoder, or a fixed interface that decoders must satisfy. Same is true for the models. The decoders and models provided by Kaldi/PyKaldi can be mixed and matched to construct specialized speech recognizers. The speech recognizers in this module cover only the most “typical” combinations.
Classes
FasterRecognizer (decoder[, symbols, …]) 
Faster speech recognizer. 
GmmFasterRecognizer (transition_model, …[, …]) 
GMM based faster speech recognizer. 
GmmLatticeBiglmFasterRecognizer (…[, …]) 
GMM based lattice generating bigLM faster speech recognizer. 
GmmLatticeFasterRecognizer (transition_model, …) 
GMM based lattice generating faster speech recognizer. 
GmmRecognizer (transition_model, …[, …]) 
Base class for GMM based speech recognizers. 
LatticeBiglmFasterRecognizer (decoder[, …]) 
Lattice generating bigLM faster speech recognizer. 
LatticeFasterRecognizer (decoder[, symbols, …]) 
Latticegenerating faster speech recognizer. 
LatticeLmRescorer (old_lm, new_lm[, phi_label]) 
Lattice LM rescorer. 
LatticeRnnlmPrunedRescorer (old_lm, …[, …]) 
Lattice RNNLM rescorer. 
MappedFasterRecognizer (transition_model, decoder) 
Mapped faster speech recognizer. 
MappedLatticeBiglmFasterRecognizer (…[, …]) 
GMM based lattice generating bigLM faster speech recognizer. 
MappedLatticeFasterRecognizer (…[, …]) 
Mapped lattice generating faster speech recognizer. 
MappedRecognizer (transition_model, decoder) 
Base class for mapped speech recognizers. 
NnetFasterRecognizer (transition_model, …) 
Neural network based faster speech recognizer. 
NnetLatticeBiglmFasterRecognizer (…[, …]) 
Neural network based lattice generating bigLM faster speech recognizer. 
NnetLatticeFasterBatchRecognizer (…[, …]) 
Neural network based lattice generating faster batch speech recognizer. 
NnetLatticeFasterGrammarRecognizer (…[, …]) 
Neural network based lattice generating faster grammar speech recognizer. 
NnetLatticeFasterOnlineGrammarRecognizer (…) 
Neural network based lattice generating faster online grammar speech recognizer. 
NnetLatticeFasterOnlineRecognizer (…[, …]) 
Neural network based lattice generating faster online speech recognizer. 
NnetLatticeFasterRecognizer (…[, symbols, …]) 
Neural network based lattice generating faster speech recognizer. 
NnetOnlineRecognizer (transition_model, …) 
Base class for neural network based online speech recognizers. 
NnetRecognizer (transition_model, …[, …]) 
Base class for neural network based speech recognizers. 
OnlineRecognizer (decoder[, symbols, …]) 
Base class for online speech recognizers. 
Recognizer (decoder[, symbols, …]) 
Base class for speech recognizers. 

class
kaldi.asr.
Recognizer
(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Base class for speech recognizers.
Parameters:  decoder (object) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)[source]¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

class
kaldi.asr.
FasterRecognizer
(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Faster speech recognizer.
This recognizer can be used to decode loglikelihood matrices. Nonzero labels on the decoding graph, e.g. transitionids, are looked up in the loglikelihood matrices using 1based indexing – index 0 is reserved for epsilon symbols in OpenFst.
Parameters:  decoder (FasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer.
Return type:

class
kaldi.asr.
LatticeFasterRecognizer
(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Latticegenerating faster speech recognizer.
This recognizer can be used to decode loglikelihood matrices into lattices. Nonzero labels on the decoding graph, e.g. transitionids, are looked up in the loglikelihood matrices using 1based indexing – index 0 is reserved for epsilon symbols in OpenFst.
Parameters:  decoder (LatticeFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer.
Return type:

class
kaldi.asr.
LatticeBiglmFasterRecognizer
(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Lattice generating bigLM faster speech recognizer.
This recognizer can be used to decode loglikelihood matrices into lattices. Nonzero labels on the decoding graph, e.g. transitionids, are looked up in the loglikelihood matrices using 1based indexing – index 0 is reserved for epsilon symbols in OpenFst.
Parameters:  decoder (LatticeBiglmFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  graph_rxfilename (str) – Extended filename for reading the graph.
 old_lm_rxfilename (str) – Extended filename for reading the old LM.
 new_lm_rxfilename (str) – Extended filename for reading the new LM.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer.
Return type:

class
kaldi.asr.
MappedRecognizer
(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Base class for mapped speech recognizers.
Parameters:  transition_model (TransitionModel) – The transition model.
 decoder (object) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

class
kaldi.asr.
MappedFasterRecognizer
(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Mapped faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 decoder (FasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the transition model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer object.
Return type:

read_model
(model_rxfilename)¶ Reads transition model from an extended filename.

class
kaldi.asr.
MappedLatticeFasterRecognizer
(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Mapped lattice generating faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 decoder (LatticeFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the transition model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer object.
Return type:

read_model
(model_rxfilename)¶ Reads transition model from an extended filename.

class
kaldi.asr.
MappedLatticeBiglmFasterRecognizer
(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ GMM based lattice generating bigLM faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 decoder (LatticeBiglmFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the transition model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 old_lm_rxfilename (str) – Extended filename for reading the old LM.
 new_lm_rxfilename (str) – Extended filename for reading the new LM.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer.
Return type:

read_model
(model_rxfilename)¶ Reads transition model from an extended filename.

class
kaldi.asr.
GmmRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Base class for GMM based speech recognizers.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmDiagGmm) – The acoustic model.
 decoder (object) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

class
kaldi.asr.
GmmFasterRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ GMM based faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmDiagGmm) – The acoustic model.
 decoder (FasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new GMM recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
Returns: A new GMM recognizer object.

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
GmmLatticeFasterRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ GMM based lattice generating faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmDiagGmm) – The acoustic model.
 decoder (LatticeFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new GMM recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns: A new GMM recognizer object.

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
GmmLatticeBiglmFasterRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ GMM based lattice generating bigLM faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmDiagGmm) – The acoustic model.
 decoder (LatticeBiglmFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 old_lm_rxfilename (str) – Extended filename for reading the old LM.
 new_lm_rxfilename (str) – Extended filename for reading the new LM.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns: A new recognizer.
Return type:

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
NnetRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]¶ Base class for neural network based speech recognizers.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (object) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

class
kaldi.asr.
NnetFasterRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]¶ Neural network based faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (FasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns: A new recognizer.
Return type:

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
NnetLatticeFasterRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]¶ Neural network based lattice generating faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (LatticeFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns: A new recognizer.
Return type:

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
NnetLatticeFasterBatchRecognizer
(transition_model, acoustic_model, graph, symbols=None, allow_partial=True, decoder_opts=None, compute_opts=None, num_threads=1, online_ivector_period=10)[source]¶ Neural network based lattice generating faster batch speech recognizer.
This uses multiple CPU threads for the graph search, plus a GPU thread for the neural net inference. The interface of this object should be accessed from only one thread, presumably the main thread of the program.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 graph (StdFst) – The decoding graph.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
 compute_opts (NnetBatchComputerOptions) – Configuration options for neural network batch computer.
 num_threads (int) – Number of processing threads.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.

accept_input
(key, input)[source]¶ Accepts input for decoding.
This should be called for each utterance that is to be decoded (interspersed with calls to
get_output()
). This call will block when no threads are ready to start processing this utterance.Input can be just a feature matrix or a tuple of a feature matrix and an ivector or a tuple of a feature matrix and an online ivector matrix.
Parameters: Raises: RuntimeError
– If decoding fails.

finished
()[source]¶ Informs the decoder that all input has been provided.
This will block until all computation threads have terminated. After that you can keep calling
get_output()
, until it raises aValueError
, to get the outputs for the remaining utterances.Returns: The number of utterances that have been successfully decoded. Return type: int

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, compute_opts=None, num_threads=1, online_ivector_period=10)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
 compute_opts (NnetBatchComputerOptions) – Configuration options for neural network batch computer.
 num_threads (int) – Number of processing threads.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns: A new recognizer.
Return type:

get_output
()[source]¶ Returns the next available output.
This returns the output for the first utterance in the output queue. The outputs returned by this method are guaranteed to be in the same order the inputs were provieded, but they may be delayed and some outputs might be missing, for instance because of search failures.
This call does not block.
Output is a dictionary with the following
(key, value)
pairs:key value value type “key” Utterence ID str
“lattice” Output lattice Lattice
orCompactLattice
“text” Output transcript str
The “lattice” output will be a deterministic compact lattice if lattice determinization is enabled. Otherwise, it will be a raw statelevel lattice. The acoustic scores in the output lattice will already be divided by the acoustic scale used in decoding.
If the decoder was not initialized with a symbol table, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols.
Returns: A dictionary representing decoding output. Raises: ValueError
– If there is no output to return.

get_outputs
()[source]¶ Creates a generator for iterating over available outputs.
Each output generated will be a dictionary like the output of
get_output()
. The outputs are generated in the same order the inputs were provided.See Also:
get_output()

class
kaldi.asr.
NnetLatticeFasterGrammarRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]¶ Neural network based lattice generating faster grammar speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (LatticeFasterGrammarDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns: A new recognizer.
Return type:

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
NnetLatticeBiglmFasterRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]¶ Neural network based lattice generating bigLM faster speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (LatticeBiglmFasterDecoder) – The decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.

decode
(input)¶ Decodes input.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

classmethod
from_files
(model_rxfilename, graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 old_lm_rxfilename (str) – Extended filename for reading the old LM.
 new_lm_rxfilename (str) – Extended filename for reading the new LM.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
 decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
 online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns: A new recognizer.
Return type:

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
OnlineRecognizer
(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]¶ Base class for online speech recognizers.
Parameters:  decoder (object) – The online decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 acoustic_scale (float) – Acoustic score scale.

advance_decoding
(max_num_frames=1)[source]¶ Advances decoding.
This will decode until there are no more frames ready in the input pipeline or
max_num_frames
are decoded. You can keep calling this as more frames become available.Parameters: max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.

decode
()[source]¶ Decodes all frames in the input pipeline and returns the output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

finalize_decoding
()[source]¶ Finalizes decoding.
This function may be optionally called after
advance_decoding()
, when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the finalprobs in pruning (if any finalstate survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot calladvance_decoding()
again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

get_output
()[source]¶ Returns decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

get_partial_output
(use_final_probs=False)[source]¶ Returns partial decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: use_final_probs (bool) – Whether to use final probabilities when computing best path. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

init_decoding
()[source]¶ Initializes decoding.
This should only be used if you intend to call
advance_decoding()
. If you intend to calldecode()
, you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

class
kaldi.asr.
NnetOnlineRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, endpoint_opts=None)[source]¶ Base class for neural network based online speech recognizers.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (object) – The online decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
 endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.

advance_decoding
(max_num_frames=1)¶ Advances decoding.
This will decode until there are no more frames ready in the input pipeline or
max_num_frames
are decoded. You can keep calling this as more frames become available.Parameters: max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.

decode
()¶ Decodes all frames in the input pipeline and returns the output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

finalize_decoding
()¶ Finalizes decoding.
This function may be optionally called after
advance_decoding()
, when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the finalprobs in pruning (if any finalstate survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot calladvance_decoding()
again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

get_output
()¶ Returns decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

get_partial_output
(use_final_probs=False)¶ Returns partial decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: use_final_probs (bool) – Whether to use final probabilities when computing best path. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

init_decoding
()¶ Initializes decoding.
This should only be used if you intend to call
advance_decoding()
. If you intend to calldecode()
, you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

class
kaldi.asr.
NnetLatticeFasterOnlineRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, endpoint_opts=None)[source]¶ Neural network based lattice generating faster online speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (LatticeFasterOnlineDecoder) – The online decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
 endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.

advance_decoding
(max_num_frames=1)¶ Advances decoding.
This will decode until there are no more frames ready in the input pipeline or
max_num_frames
are decoded. You can keep calling this as more frames become available.Parameters: max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.

decode
()¶ Decodes all frames in the input pipeline and returns the output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

finalize_decoding
()¶ Finalizes decoding.
This function may be optionally called after
advance_decoding()
, when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the finalprobs in pruning (if any finalstate survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot calladvance_decoding()
again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, endpoint_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
 decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
 endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
Returns: A new recognizer.
Return type:

get_output
()¶ Returns decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

get_partial_output
(use_final_probs=False)¶ Returns partial decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: use_final_probs (bool) – Whether to use final probabilities when computing best path. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

init_decoding
()¶ Initializes decoding.
This should only be used if you intend to call
advance_decoding()
. If you intend to calldecode()
, you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
NnetLatticeFasterOnlineGrammarRecognizer
(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, endpoint_opts=None)[source]¶ Neural network based lattice generating faster online grammar speech recognizer.
Parameters:  transition_model (TransitionModel) – The transition model.
 acoustic_model (AmNnetSimple) – The acoustic model.
 decoder (LatticeFasterOnlineGrammarDecoder) – The online decoder.
 symbols (SymbolTable) – The symbol table. If provided, “text” output of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
 endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.

advance_decoding
(max_num_frames=1)¶ Advances decoding.
This will decode until there are no more frames ready in the input pipeline or
max_num_frames
are decoded. You can keep calling this as more frames become available.Parameters: max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.

decode
()¶ Decodes all frames in the input pipeline and returns the output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: input (object) – Input to decode. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

finalize_decoding
()¶ Finalizes decoding.
This function may be optionally called after
advance_decoding()
, when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the finalprobs in pruning (if any finalstate survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot calladvance_decoding()
again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

classmethod
from_files
(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, endpoint_opts=None)[source]¶ Constructs a new recognizer from given files.
Parameters:  model_rxfilename (str) – Extended filename for reading the model.
 graph_rxfilename (str) – Extended filename for reading the graph.
 symbols_filename (str) – The symbols file. If provided, “text” output
of
decode()
includes symbols instead of integer indices.  allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
 decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
 decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
 endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
Returns: A new recognizer.
Return type:

get_output
()¶ Returns decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice
orCompactLattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw statelevel lattice.
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

get_partial_output
(use_final_probs=False)¶ Returns partial decoding output.
Output is a dictionary with the following
(key, value)
pairs:key value value type “alignment” Framelevel alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Loglikelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]
If
symbols
isNone
, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graphscore, acousticscore).Parameters: use_final_probs (bool) – Whether to use final probabilities when computing best path. Returns: A dictionary representing decoding output. Raises: RuntimeError
– If decoding fails.

init_decoding
()¶ Initializes decoding.
This should only be used if you intend to call
advance_decoding()
. If you intend to calldecode()
, you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

read_model
(model_rxfilename)¶ Reads model from an extended filename.

class
kaldi.asr.
LatticeLmRescorer
(old_lm, new_lm, phi_label=None)[source]¶ Lattice LM rescorer.
If
phi_label
is provided, rescoring will be “exact” in the sense that backoff arcs in the new LM will only be taken if there are no other matching arcs. Inexact rescoring can overestimate the new LM scores for some paths in the output lattice. This happens when backoff paths have higher scores than matching regular paths in the new LM.Parameters:  old_lm (StdFst) – Old language model FST.
 new_lm (StdFst) – New language model FST.
 phi_label (int) – Backoff label in the new LM.

classmethod
from_files
(old_lm_rxfilename, new_lm_rxfilename, phi_label=None)[source]¶ Constructs a new lattice LM rescorer from given files.
Parameters: Returns: A new lattice LM rescorer.
Return type: LatticeRescorer

rescore
(lat)[source]¶ Rescores input lattice.
Parameters: lat (CompactLatticeFst) – Input lattice. Returns: Rescored lattice. Return type: CompactLatticeVectorFst