kaldi.asr

This module provides a number of speech recognizers with an easy to use API.

Note that in Kaldi, therefore in PyKaldi, there is no single “canonical” decoder, or a fixed interface that decoders must satisfy. Same is true for the models. The decoders and models provided by Kaldi/PyKaldi can be mixed and matched to construct specialized speech recognizers. The speech recognizers in this module cover only the most “typical” combinations.

Classes

FasterRecognizer(decoder[, symbols, …]) Faster speech recognizer.
GmmFasterRecognizer(transition_model, …[, …]) GMM based faster speech recognizer.
GmmLatticeBiglmFasterRecognizer(…[, …]) GMM based lattice generating big-LM faster speech recognizer.
GmmLatticeFasterRecognizer(transition_model, …) GMM based lattice generating faster speech recognizer.
GmmRecognizer(transition_model, …[, …]) Base class for GMM based speech recognizers.
LatticeBiglmFasterRecognizer(decoder[, …]) Lattice generating big-LM faster speech recognizer.
LatticeFasterRecognizer(decoder[, symbols, …]) Lattice-generating faster speech recognizer.
LatticeLmRescorer(old_lm, new_lm[, phi_label]) Lattice LM rescorer.
LatticeRnnlmPrunedRescorer(old_lm, …[, …]) Lattice RNNLM rescorer.
MappedFasterRecognizer(transition_model, decoder) Mapped faster speech recognizer.
MappedLatticeBiglmFasterRecognizer(…[, …]) GMM based lattice generating big-LM faster speech recognizer.
MappedLatticeFasterRecognizer(…[, …]) Mapped lattice generating faster speech recognizer.
MappedRecognizer(transition_model, decoder) Base class for mapped speech recognizers.
NnetFasterRecognizer(transition_model, …) Neural network based faster speech recognizer.
NnetLatticeBiglmFasterRecognizer(…[, …]) Neural network based lattice generating big-LM faster speech recognizer.
NnetLatticeFasterBatchRecognizer(…[, …]) Neural network based lattice generating faster batch speech recognizer.
NnetLatticeFasterGrammarRecognizer(…[, …]) Neural network based lattice generating faster grammar speech recognizer.
NnetLatticeFasterOnlineGrammarRecognizer(…) Neural network based lattice generating faster online grammar speech recognizer.
NnetLatticeFasterOnlineRecognizer(…[, …]) Neural network based lattice generating faster online speech recognizer.
NnetLatticeFasterRecognizer(…[, symbols, …]) Neural network based lattice generating faster speech recognizer.
NnetOnlineRecognizer(transition_model, …) Base class for neural network based online speech recognizers.
NnetRecognizer(transition_model, …[, …]) Base class for neural network based speech recognizers.
OnlineRecognizer(decoder[, symbols, …]) Base class for online speech recognizers.
Recognizer(decoder[, symbols, …]) Base class for speech recognizers.
class kaldi.asr.Recognizer(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Base class for speech recognizers.

Parameters:
  • decoder (object) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)[source]

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
class kaldi.asr.FasterRecognizer(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Faster speech recognizer.

This recognizer can be used to decode log-likelihood matrices. Non-zero labels on the decoding graph, e.g. transition-ids, are looked up in the log-likelihood matrices using 1-based indexing – index 0 is reserved for epsilon symbols in OpenFst.

Parameters:
  • decoder (FasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer.

Return type:

FasterRecognizer

class kaldi.asr.LatticeFasterRecognizer(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Lattice-generating faster speech recognizer.

This recognizer can be used to decode log-likelihood matrices into lattices. Non-zero labels on the decoding graph, e.g. transition-ids, are looked up in the log-likelihood matrices using 1-based indexing – index 0 is reserved for epsilon symbols in OpenFst.

Parameters:
  • decoder (LatticeFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer.

Return type:

LatticeFasterRecognizer

class kaldi.asr.LatticeBiglmFasterRecognizer(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Lattice generating big-LM faster speech recognizer.

This recognizer can be used to decode log-likelihood matrices into lattices. Non-zero labels on the decoding graph, e.g. transition-ids, are looked up in the log-likelihood matrices using 1-based indexing – index 0 is reserved for epsilon symbols in OpenFst.

Parameters:
  • decoder (LatticeBiglmFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • old_lm_rxfilename (str) – Extended filename for reading the old LM.
  • new_lm_rxfilename (str) – Extended filename for reading the new LM.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer.

Return type:

LatticeBiglmFasterRecognizer

class kaldi.asr.MappedRecognizer(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Base class for mapped speech recognizers.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • decoder (object) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
static read_model(model_rxfilename)[source]

Reads transition model from an extended filename.

class kaldi.asr.MappedFasterRecognizer(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Mapped faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • decoder (FasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the transition model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer object.

Return type:

MappedFasterRecognizer

read_model(model_rxfilename)

Reads transition model from an extended filename.

class kaldi.asr.MappedLatticeFasterRecognizer(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Mapped lattice generating faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • decoder (LatticeFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the transition model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer object.

Return type:

MappedFasterRecognizer

read_model(model_rxfilename)

Reads transition model from an extended filename.

class kaldi.asr.MappedLatticeBiglmFasterRecognizer(transition_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

GMM based lattice generating big-LM faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • decoder (LatticeBiglmFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the transition model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • old_lm_rxfilename (str) – Extended filename for reading the old LM.
  • new_lm_rxfilename (str) – Extended filename for reading the new LM.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer.

Return type:

MappedLatticeBiglmFasterRecognizer

read_model(model_rxfilename)

Reads transition model from an extended filename.

class kaldi.asr.GmmRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Base class for GMM based speech recognizers.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmDiagGmm) – The acoustic model.
  • decoder (object) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
static read_model(model_rxfilename)[source]

Reads model from an extended filename.

class kaldi.asr.GmmFasterRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

GMM based faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmDiagGmm) – The acoustic model.
  • decoder (FasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new GMM recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new GMM recognizer object.

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.GmmLatticeFasterRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

GMM based lattice generating faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmDiagGmm) – The acoustic model.
  • decoder (LatticeFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new GMM recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new GMM recognizer object.

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.GmmLatticeBiglmFasterRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

GMM based lattice generating big-LM faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmDiagGmm) – The acoustic model.
  • decoder (LatticeBiglmFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, acoustic_scale=0.1, decoder_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • old_lm_rxfilename (str) – Extended filename for reading the old LM.
  • new_lm_rxfilename (str) – Extended filename for reading the new LM.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
Returns:

A new recognizer.

Return type:

GmmLatticeBiglmFasterRecognizer

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.NnetRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]

Base class for neural network based speech recognizers.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (object) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
static read_model(model_rxfilename)[source]

Reads model from an extended filename.

class kaldi.asr.NnetFasterRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]

Neural network based faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (FasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns:

A new recognizer.

Return type:

NnetFasterRecognizer

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.NnetLatticeFasterRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]

Neural network based lattice generating faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (LatticeFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns:

A new recognizer.

Return type:

NnetLatticeFasterRecognizer

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.NnetLatticeFasterBatchRecognizer(transition_model, acoustic_model, graph, symbols=None, allow_partial=True, decoder_opts=None, compute_opts=None, num_threads=1, online_ivector_period=10)[source]

Neural network based lattice generating faster batch speech recognizer.

This uses multiple CPU threads for the graph search, plus a GPU thread for the neural net inference. The interface of this object should be accessed from only one thread, presumably the main thread of the program.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • graph (StdFst) – The decoding graph.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
  • compute_opts (NnetBatchComputerOptions) – Configuration options for neural network batch computer.
  • num_threads (int) – Number of processing threads.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
accept_input(key, input)[source]

Accepts input for decoding.

This should be called for each utterance that is to be decoded (interspersed with calls to get_output()). This call will block when no threads are ready to start processing this utterance.

Input can be just a feature matrix or a tuple of a feature matrix and an ivector or a tuple of a feature matrix and an online ivector matrix.

Parameters:
Raises:

RuntimeError – If decoding fails.

finished()[source]

Informs the decoder that all input has been provided.

This will block until all computation threads have terminated. After that you can keep calling get_output(), until it raises a ValueError, to get the outputs for the remaining utterances.

Returns:The number of utterances that have been successfully decoded.
Return type:int
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, compute_opts=None, num_threads=1, online_ivector_period=10)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
  • compute_opts (NnetBatchComputerOptions) – Configuration options for neural network batch computer.
  • num_threads (int) – Number of processing threads.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns:

A new recognizer.

Return type:

NnetLatticeFasterBatchRecognizer

get_output()[source]

Returns the next available output.

This returns the output for the first utterance in the output queue. The outputs returned by this method are guaranteed to be in the same order the inputs were provieded, but they may be delayed and some outputs might be missing, for instance because of search failures.

This call does not block.

Output is a dictionary with the following (key, value) pairs:

key value value type
“key” Utterence ID str
“lattice” Output lattice Lattice or CompactLattice
“text” Output transcript str

The “lattice” output will be a deterministic compact lattice if lattice determinization is enabled. Otherwise, it will be a raw state-level lattice. The acoustic scores in the output lattice will already be divided by the acoustic scale used in decoding.

If the decoder was not initialized with a symbol table, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols.

Returns:A dictionary representing decoding output.
Raises:ValueError – If there is no output to return.
get_outputs()[source]

Creates a generator for iterating over available outputs.

Each output generated will be a dictionary like the output of get_output(). The outputs are generated in the same order the inputs were provided.

See Also: get_output()

static read_model(model_rxfilename)[source]

Reads model from an extended filename.

utterance_failed()[source]

Informs the decoder that there was a problem with an utterance.

This will update the number of failed utterances stats.

class kaldi.asr.NnetLatticeFasterGrammarRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]

Neural network based lattice generating faster grammar speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (LatticeFasterGrammarDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns:

A new recognizer.

Return type:

NnetLatticeFasterGrammarRecognizer

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.NnetLatticeBiglmFasterRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, online_ivector_period=10)[source]

Neural network based lattice generating big-LM faster speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (LatticeBiglmFasterDecoder) – The decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
decode(input)

Decodes input.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
classmethod from_files(model_rxfilename, graph_rxfilename, old_lm_rxfilename, new_lm_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, online_ivector_period=10)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • old_lm_rxfilename (str) – Extended filename for reading the old LM.
  • new_lm_rxfilename (str) – Extended filename for reading the new LM.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (LatticeFasterDecoderOptions) – Configuration options for the decoder.
  • decodable_opts (NnetSimpleComputationOptions) – Configuration options for simple nnet3 am decodable objects.
  • online_ivector_period (int) – Onlne ivector period. Relevant only if online ivectors are used.
Returns:

A new recognizer.

Return type:

NnetLatticeBiglmFasterRecognizer

read_model(model_rxfilename)

Reads model from an extended filename.

class kaldi.asr.OnlineRecognizer(decoder, symbols=None, allow_partial=True, acoustic_scale=0.1)[source]

Base class for online speech recognizers.

Parameters:
  • decoder (object) – The online decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • acoustic_scale (float) – Acoustic score scale.
advance_decoding(max_num_frames=-1)[source]

Advances decoding.

This will decode until there are no more frames ready in the input pipeline or max_num_frames are decoded. You can keep calling this as more frames become available.

Parameters:max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.
decode()[source]

Decodes all frames in the input pipeline and returns the output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
finalize_decoding()[source]

Finalizes decoding.

This function may be optionally called after advance_decoding(), when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the final-probs in pruning (if any final-state survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot call advance_decoding() again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

get_output()[source]

Returns decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
get_partial_output(use_final_probs=False)[source]

Returns partial decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:use_final_probs (bool) – Whether to use final probabilities when computing best path.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
init_decoding()[source]

Initializes decoding.

This should only be used if you intend to call advance_decoding(). If you intend to call decode(), you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

set_input_pipeline(input_pipeline)[source]

Sets input pipeline.

Parameters:input_pipeline (object) – Input pipeline to decode online.
class kaldi.asr.NnetOnlineRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, endpoint_opts=None)[source]

Base class for neural network based online speech recognizers.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (object) – The online decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
  • endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
advance_decoding(max_num_frames=-1)

Advances decoding.

This will decode until there are no more frames ready in the input pipeline or max_num_frames are decoded. You can keep calling this as more frames become available.

Parameters:max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.
decode()

Decodes all frames in the input pipeline and returns the output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
finalize_decoding()

Finalizes decoding.

This function may be optionally called after advance_decoding(), when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the final-probs in pruning (if any final-state survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot call advance_decoding() again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

get_output()

Returns decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
get_partial_output(use_final_probs=False)

Returns partial decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:use_final_probs (bool) – Whether to use final probabilities when computing best path.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
init_decoding()

Initializes decoding.

This should only be used if you intend to call advance_decoding(). If you intend to call decode(), you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

static read_model(model_rxfilename)[source]

Reads model from an extended filename.

set_input_pipeline(input_pipeline)

Sets input pipeline.

Parameters:input_pipeline (object) – Input pipeline to decode online.
class kaldi.asr.NnetLatticeFasterOnlineRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, endpoint_opts=None)[source]

Neural network based lattice generating faster online speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (LatticeFasterOnlineDecoder) – The online decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
  • endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
advance_decoding(max_num_frames=-1)

Advances decoding.

This will decode until there are no more frames ready in the input pipeline or max_num_frames are decoded. You can keep calling this as more frames become available.

Parameters:max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.
decode()

Decodes all frames in the input pipeline and returns the output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
endpoint_detected()[source]

Determines if any of the endpointing rules are active.

finalize_decoding()

Finalizes decoding.

This function may be optionally called after advance_decoding(), when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the final-probs in pruning (if any final-state survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot call advance_decoding() again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, endpoint_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
  • decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
  • endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
Returns:

A new recognizer.

Return type:

NnetLatticeFasterOnlineRecognizer

get_output()

Returns decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
get_partial_output(use_final_probs=False)

Returns partial decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:use_final_probs (bool) – Whether to use final probabilities when computing best path.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
init_decoding()

Initializes decoding.

This should only be used if you intend to call advance_decoding(). If you intend to call decode(), you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

read_model(model_rxfilename)

Reads model from an extended filename.

set_input_pipeline(input_pipeline)

Sets input pipeline.

Parameters:input_pipeline (object) – Input pipeline to decode online.
class kaldi.asr.NnetLatticeFasterOnlineGrammarRecognizer(transition_model, acoustic_model, decoder, symbols=None, allow_partial=True, decodable_opts=None, endpoint_opts=None)[source]

Neural network based lattice generating faster online grammar speech recognizer.

Parameters:
  • transition_model (TransitionModel) – The transition model.
  • acoustic_model (AmNnetSimple) – The acoustic model.
  • decoder (LatticeFasterOnlineGrammarDecoder) – The online decoder.
  • symbols (SymbolTable) – The symbol table. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
  • endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
advance_decoding(max_num_frames=-1)

Advances decoding.

This will decode until there are no more frames ready in the input pipeline or max_num_frames are decoded. You can keep calling this as more frames become available.

Parameters:max_num_frames (int) – Maximum number of frames to decode. If negative, all available frames are decoded.
decode()

Decodes all frames in the input pipeline and returns the output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:input (object) – Input to decode.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
endpoint_detected()[source]

Determines if any of the endpointing rules are active.

finalize_decoding()

Finalizes decoding.

This function may be optionally called after advance_decoding(), when you do not plan to decode any further. It does an extra pruning step that will help to prune the output lattices more accurately, particularly toward the end of the utterance. It does this by using the final-probs in pruning (if any final-state survived); it also does a final pruning step that visits all states (the pruning that is done during decoding may fail to prune states that are within pruning_scale = 0.1 outside of the beam). If you call this, you cannot call advance_decoding() again (it will fail), and you cannot call get_lattice and related functions with use_final_probs = false.

classmethod from_files(model_rxfilename, graph_rxfilename, symbols_filename=None, allow_partial=True, decoder_opts=None, decodable_opts=None, endpoint_opts=None)[source]

Constructs a new recognizer from given files.

Parameters:
  • model_rxfilename (str) – Extended filename for reading the model.
  • graph_rxfilename (str) – Extended filename for reading the graph.
  • symbols_filename (str) – The symbols file. If provided, “text” output of decode() includes symbols instead of integer indices.
  • allow_partial (bool) – Whether to output decoding results if no final state was active on the last frame.
  • decoder_opts (FasterDecoderOptions) – Configuration options for the decoder.
  • decodable_opts (NnetSimpleLoopedComputationOptions) – Configuration options for simple looped neural network computation.
  • endpoint_opts (OnlineEndpopython:intConfig) – Online endpointing configuration.
Returns:

A new recognizer.

Return type:

NnetLatticeFasterOnlineGrammarRecognizer

get_output()

Returns decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path CompactLattice
“lattice” Output lattice Lattice or CompactLattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

The “lattice” output is produced only if the decoder can generate lattices. It will be a deterministic compact lattice if the decoder is configured to determinize lattices. Otherwise, it will be a raw state-level lattice.

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
get_partial_output(use_final_probs=False)

Returns partial decoding output.

Output is a dictionary with the following (key, value) pairs:

key value value type
“alignment” Frame-level alignment List[int]
“best_path” Best lattice path Lattice
“likelihood” Log-likelihood of best path float
“text” Output transcript str
“weight” Cost of best path LatticeWeight
“words” Words on best path List[int]

If symbols is None, the “text” output will be a string of space separated integer indices. Otherwise it will be a string of space separated symbols. The “weight” output is a lattice weight consisting of (graph-score, acoustic-score).

Parameters:use_final_probs (bool) – Whether to use final probabilities when computing best path.
Returns:A dictionary representing decoding output.
Raises:RuntimeError – If decoding fails.
init_decoding()

Initializes decoding.

This should only be used if you intend to call advance_decoding(). If you intend to call decode(), you don’t need to call this. You can also call this method if you have already decoded an utterance and want to start with a new utterance.

read_model(model_rxfilename)

Reads model from an extended filename.

set_input_pipeline(input_pipeline)

Sets input pipeline.

Parameters:input_pipeline (object) – Input pipeline to decode online.
class kaldi.asr.LatticeLmRescorer(old_lm, new_lm, phi_label=None)[source]

Lattice LM rescorer.

If phi_label is provided, rescoring will be “exact” in the sense that back-off arcs in the new LM will only be taken if there are no other matching arcs. Inexact rescoring can overestimate the new LM scores for some paths in the output lattice. This happens when back-off paths have higher scores than matching regular paths in the new LM.

Parameters:
  • old_lm (StdFst) – Old language model FST.
  • new_lm (StdFst) – New language model FST.
  • phi_label (int) – Back-off label in the new LM.
classmethod from_files(old_lm_rxfilename, new_lm_rxfilename, phi_label=None)[source]

Constructs a new lattice LM rescorer from given files.

Parameters:
  • old_lm_rxfilename (str) – Extended filename for reading the old LM.
  • new_lm_rxfilename (str) – Extended filename for reading the new LM.
  • phi_label (int) – Back-off label in the new LM.
Returns:

A new lattice LM rescorer.

Return type:

LatticeRescorer

rescore(lat)[source]

Rescores input lattice.

Parameters:lat (CompactLatticeFst) – Input lattice.
Returns:Rescored lattice.
Return type:CompactLatticeVectorFst