kaldi.lat

kaldi.lat.align

Functions

phone_align_lattice Aligns the phone labels and transition-ids.
read_lexicon_for_word_align Reads the lexicon in the special format required for word alignment.
test_word_aligned_lattice Verifies the output of word_align_lattice.
word_align_lattice Aligns the word labels and transition-ids.
word_align_lattice_lexicon Aligns the word labels and transition-ids using a lexicon.

Classes

PhoneAlignLatticeOptions Options for phone alignment.
WordAlignLatticeLexiconInfo This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use.
WordAlignLatticeLexiconOpts Options for word alignment using a lexicon.
WordBoundaryInfo Word boundary information.
WordBoundaryInfoNewOpts Options for word alignment using word boundary phones.
class kaldi.lat.align.PhoneAlignLatticeOptions

Options for phone alignment.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
remove_epsilon

Whether to remove epsilon arcs from the phone lattice.

If replace_output_symbols is False, this will mean that an arc can have multiple phones on it.

reorder

Whether lattice was created from a graph with reorder option set.

replace_output_symbols

Whether to replace output symbols (typically words) with phones.

class kaldi.lat.align.WordAlignLatticeLexiconInfo(lexicon)

This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use.

Parameters:lexicon (List[List[int]]) – The lexicon.
equivalence_class_of(word:int) → int

Returns the equivalence class for the word.

This function is used in testing code.

Words are mapped into equivalence classes derived from the mappings in the first two fields of each line in the lexicon. This function maps from each word-id to the lowest member of its equivalence class.

is_valid_entry(entry:list<int>) → bool

Checks if entry is valid.

This function is used in testing code.

Returns:True if the entry intepreted as (output-word phone1 phone2 …) can appear in the lexicon.
class kaldi.lat.align.WordAlignLatticeLexiconOpts

Options for word alignment using a lexicon.

allow_duplicate_paths

Whether to allow duplicate paths in testing code.

max_expand

Maximum allowed ratio of #states in aligned lattice vs input lattice.

If >0.0, the maximum ratio by which we allow the lattice-alignment code to increase the #states in a lattice (vs. the phone-aligned lattice) before we fail and refuse to align the lattice. This is helpful in order to prevent ‘pathological’ lattices from causing the program to exhaust memory. Actual max-states is 1000 + max-expand * orig-num-states.

partial_word_label

Label for partial word arcs at the end of “forced-out” utterances.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
reorder

Whether lattice was created from a graph with reorder option set.

test

Whether to activate the testing code to validate the algorithm.

class kaldi.lat.align.WordBoundaryInfo(opts)

Word boundary information.

Parameters:opts (WordBoundaryInfoNewOpts) – Decoder options.
PhoneType

alias of WordBoundaryInfo.PhoneType

from_file(opts:WordBoundaryInfoNewOpts, word_boundary_file:str) → WordBoundaryInfo

Creates a new WordBoundaryInfo object from file.

init(is:istream)

Initializes with information read from an input stream.

partial_word_label

Label for partial word arcs at the end of “forced-out” utterances.

phone_to_type

Mapping from phone ids to phone types.

reorder

Whether lattice was created from a graph with reorder option set.

silence_label

Label for silence arcs.

type_of_phone(p:int) → PhoneType

Looks up the type of the given phone id.

Parameters:p (int) – The input phone id.
Returns:The type of input phone id.
Return type:PhoneType
class kaldi.lat.align.WordBoundaryInfoNewOpts

Options for word alignment using word boundary phones.

partial_word_label

Label for partial word arcs at the end of “forced-out” utterances.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
reorder

Whether lattice was created from a graph with reorder option set.

silence_label

Label for silence arcs.

kaldi.lat.align.phone_align_lattice(lat, tmodel, opts)[source]

Aligns the phone labels and transition-ids.

Outputs a lattice in which the arcs correspond exactly to sequences of phones, so the boundaries between the arcs correspond to the boundaries between phones.

Parameters:
Returns:

A tuple representing the return value and the output lattice. The return value is set to True if the operation was successful, False if some kind of problem was detected, e.g. transition-id sequences in the lattice were incompatible with the model.

Note

If this function returns False, it doesn’t mean the output lattice is necessarily bad. It might just be that the input lattice was “forced out” with partial words due to no final state being reached during decoding, and in this case the output might still be usable.

Note

If opts.remove_epsilon == True and opts.replace_output_symbols == False, an arc may have >1 phone on it, but the boundaries will still correspond with the boundaries between phones.

Note

If opts.replace_output_symbols == False, it is possible to have arcs with words on them but no transition-ids at all.

kaldi.lat.align.read_lexicon_for_word_align(rxfilename)[source]

Reads the lexicon in the special format required for word alignment.

Each line has a series of integers on it (at least two on each line), representing:

<old-word-id> <new-word-id> [<phone-id-1> [<phone-id-2> … ] ]

Here, <old-word-id> is the word-id that appears in the lattice before alignment, and <new-word-id> is the word-is that should appear in the lattice after alignment. This is mainly useful when the lattice may have no symbol for the optional-silence arcs (so <old-word-id> would equal zero), but we want it to be output with a symbol on those arcs (so <new-word-id> would be nonzero). If the silence should not be added to the lattice, both <old-word-id> and <new-word-id> may be zero.

Parameters:rxfilename (str) – Extended filename for reading the lexicon.
Returns
List[List[int]]: The lexicon in the format required for word alignment.
Raises:ValueError – If reading the lexicon fails.
kaldi.lat.align.test_word_aligned_lattice(lat:CompactLatticeVectorFst, tmodel:TransitionModel, info:WordBoundaryInfo, aligned_lat:CompactLatticeVectorFst)

Verifies the output of word_align_lattice.

Parameters:
Raises:

RuntimeError – If verification fails.

kaldi.lat.align.word_align_lattice(lat, tmodel, info, max_states)[source]

Aligns the word labels and transition-ids.

Aligns compact lattice so that each arc has the transition-ids on it that correspond to the word that is on that arc. It is OK for the lattice to have epsilon arcs for optional silences.

Parameters:
  • lat (CompactLatticeVectorFst) – The input lattice.
  • tmodel (TransitionModel) – The transition model.
  • info (WordBoundaryInfo) – The word boundary information.
  • max_states (int) – Maximum #states allowed in the output lattice. If max_states > 0 and the #states of the output will be greater than max_states, this function will abort the computation, return False and output an empty lattice.
Returns:

A tuple representing the return value and the output lattice. The return value is set to True if the operation was successful, False if some kind of problem was detected, e.g. transition-id sequences in the lattice were incompatible with the word boundary information.

Note

We don’t expect silence inside words, or empty words (words with no phones), and we expect the word to start with a wbegin_phone, to end with a wend_phone, and to possibly have winternal_phones inside (or to consist of just one wbegin_and_end_phone).

Note

If this function returns False, it doesn’t mean the output lattice is necessarily bad. It might just be that the input lattice was “forced out” with partial words due to no final state being reached during decoding, and in this case the output might still be usable.

kaldi.lat.align.word_align_lattice_lexicon(lat, tmodel, lexicon_info, opts)[source]

Aligns the word labels and transition-ids using a lexicon.

Aligns compact lattice so that each arc has the transition-ids on it that correspond to the word that is on that arc. It is OK for the lattice to have epsilon arcs for optional silences.

Parameters:
Returns:

A tuple representing the return value and the output lattice. The return value is set to True if the operation was successful, False if some kind of problem was detected, e.g. transition-id sequences in the lattice were incompatible with the lexicon information.

Note

If this function returns False, it doesn’t mean the output lattice is necessarily bad. It might just be that the input lattice was “forced out” with partial words due to no final state being reached during decoding, and in this case the output might still be usable.

kaldi.lat.functions

Functions

add_word_ins_pen_to_compact_lattice Adds the penalty term to the graph scores of arcs in the lattice.
compact_lattice_depth Computes the depth of the compact lattice.
compact_lattice_depth_per_frame Computes the per-frame depth of the compact lattice.
compact_lattice_limit_depth Limits the depth of the compact lattice.
compact_lattice_shortest_path Computes the shortest path in an acyclic compact lattice.
compact_lattice_to_word_alignment Extracts word alignment from a linear compact lattice.
compact_lattice_to_word_prons Extracts word pronunciations from a linear compact lattice.
compose_compact_lattice_deterministic Composes a compact lattice with a deterministic on-demand FST.
compose_compact_lattice_pruned Does pruned composition of a lattice and a deterministic on demand FST.
compute_compact_lattice_alphas Computes the forward scores (alpha) for compact lattice states.
compute_compact_lattice_betas Computes the backward scores (beta) for compact lattice states.
compute_lattice_alphas_and_betas Computes forward and backward scores for lattice states.
convert_compact_lattice_to_phones Replaces transition-ids in compact lattice with phones.
convert_lattice_to_phones Replaces output symbols in lattice with phones.
determinize_lattice_phone_pruned Applies a specialized determinization operation to a lattice.
determinize_lattice_pruned Applies a specialized determinization operation to a lattice.
get_per_frame_acoustic_costs Extracts per-frame log likelihoods from a linear lattice.
lattice_active_phones Computes the set of phones active on each frame.
lattice_boost Boosts graph scores in the lattice.
lattice_forward_backward Computes lattice arc posteriors using forward-backward algorithm.
lattice_forward_backward_mmi Computes lattice posteriors for MMI.
lattice_forward_backward_mpe_variants Computes lattice posteriors for MPFE (or SMBR).
lattice_state_times Extracts lattice state times (in terms of frames).
longest_sentence_length_in_lattice Returns the number of words in the longest sentence in a lattice.
minimize_compact_lattice Applies a specialized minimization operation to compact lattices.
prune_lattice Prunes a lattice.
push_compact_lattice_strings Pushes the transition-ids as far towards the start as they will go.
push_compact_lattice_weights Pushes the weights in compact lattice toward the start state.
rescore_compact_lattice_speedup Adjusts acoustic scores in the compact lattice.
rescore_lattice Adjusts acoustic scores in the lattice.
sentence_level_confidence Computes sentence level confidence scores.
top_sort_lattice_if_needed Topologically sorts the lattice if it is not already sorted.

Classes

ComposeLatticePrunedOptions Options for pruned lattice composition.
DeterminizeLatticePhonePrunedOptions Options for pruning and phone+word determinizing a lattice.
DeterminizeLatticePrunedOptions Options for pruning and word determinizing a lattice.
class kaldi.lat.functions.ComposeLatticePrunedOptions

Options for pruned lattice composition.

growth_ratio

Determines how much num-arcs can grow on each outer iteration (default=1.5).

initial_num_arcs

Number of arcs used on the first outer iteration (default=100).

lattice_compose_beam

Beam width explored during composition (default=6.0).

max_arcs

Maximum number of arcs to expand (default=100000).

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
class kaldi.lat.functions.DeterminizeLatticePhonePrunedOptions

Options for pruning and phone+word determinizing a lattice.

delta

A small offset used to measure equality of weights.

max_mem

Maximum memory threshold for the determinization operation.

If > 0, determinization will fail and return false when the algorithm’s (approximate) memory consumption crosses this threshold.

minimize

Whether to push and minimize the output after determinization.

phone_determinize

Whether to do a first pass determinization on both phones and words.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
word_determinize

Whether to do a second pass determinization on words only.

class kaldi.lat.functions.DeterminizeLatticePrunedOptions

Options for pruning and word determinizing a lattice.

delta

A small offset used to measure equality of weights.

max_arcs

Maximum number of arcs allowed in output FST.

max_loop

Maximum loop threshold for the determinization operation.

If >0, can be used to detect non-determinizable input (a case that wouldn’t be caught by max_mem).

max_mem

Maximum memory threshold for the determinization operation.

If > 0, determinization will fail and return false when the algorithm’s (approximate) memory consumption crosses this threshold.

max_states

Maximum number of states allowed in output FST.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
retry_cutoff

Cutoff value used when retrying a failed determinization operation.

Controls pruning un-determinized lattice and retrying determinization: if effective-beam < retry-cutoff * beam, we prune the raw lattice and retry. Avoids ever getting empty output for long segments.

kaldi.lat.functions.add_word_ins_pen_to_compact_lattice(word_ins_penalty:float, clat:CompactLatticeVectorFst)

Adds the penalty term to the graph scores of arcs in the lattice.

kaldi.lat.functions.compact_lattice_depth(clat:CompactLatticeVectorFst) -> (depth:float, num_frames:int)

Computes the depth of the compact lattice.

Returns the depth of the lattice, defined as the average number of arcs (or final-prob strings) crossing any given frame, and the number of frames. Lattice depth is 1 for empty lattices. Requires that the lattice is topologically sorted!

kaldi.lat.functions.compact_lattice_depth_per_frame(clat:CompactLatticeVectorFst) → list<int>

Computes the per-frame depth of the compact lattice.

Returns the per-frame depth of the lattice, defined as the number of arcs (or final-prob strings) crossing any given frame. Requires that the lattice is topologically sorted!

kaldi.lat.functions.compact_lattice_limit_depth(max_arcs_per_frame:int, clat:CompactLatticeVectorFst)

Limits the depth of the compact lattice.

Does not allow more than a specified number of arcs active on any given frame. This can be used to reduce the size of the “very deep” portions of the lattice.

kaldi.lat.functions.compact_lattice_shortest_path(clat:CompactLatticeVectorFst) → CompactLatticeVectorFst

Computes the shortest path in an acyclic compact lattice.

kaldi.lat.functions.compact_lattice_to_word_alignment(clat:CompactLatticeVectorFst) -> (success:bool, words:list<int>, begin_times:list<int>, lengths:list<int>)

Extracts word alignment from a linear compact lattice.

This function takes a compact lattice that should only contain a single linear sequence (e.g. output of compact_lattice_shortest_path()), and that should have been processed so that the arcs align correctly with the word boundaries (e.g. by word_align_lattice()). It outputs 3 lists of the same size, which represent, for each word in the lattice (in sequence), the word label and the begin time and length in frames. This is done even for zero (epsilon) words, generally corresponding to optional silence – if you don’t want them, just ignore them in the output.

Raises:
  • ValueError – If the lattice does not have the correct format (e.g. if
  • it is empty or if it is not linear).
kaldi.lat.functions.compact_lattice_to_word_prons(tmodel:TransitionModel, clat:CompactLatticeVectorFst) -> (success:bool, words:list<int>, begin_times:list<int>, lengths:list<int>, prons:list<list<int>>, phone_lengths:list<list<int>>)

Extracts word pronunciations from a linear compact lattice.

This function takes a compact lattice that should only contain a single linear sequence (e.g. output of compact_lattice_shortest_path()), and that should have been processed so that the arcs align correctly with the word boundaries (e.g. by word_align_lattice()). It outputs 4 vectors of the same size, which represent, for each word in the lattice (in sequence), the word label, the begin time, length in frames, and the pronunciation (sequence of phones). This is done even for zero (epsilon) words, corresponding to optional silences – if you don’t want them, just ignore them in the output.

Raises:
  • ValueError – If the lattice does not have the correct format (e.g. if
  • it is empty or if it is not linear).
kaldi.lat.functions.compose_compact_lattice_deterministic(clat:CompactLatticeVectorFst, det_fst:StdDeterministicOnDemandFst) → CompactLatticeVectorFst

Composes a compact lattice with a deterministic on-demand FST.

This function is used in language model rescoring. Composition affects only graph costs. The output is another compact lattice.

kaldi.lat.functions.compose_compact_lattice_pruned(opts:ComposeLatticePrunedOptions, clat:CompactLatticeVectorFst, det_fst:StdDeterministicOnDemandFst) → CompactLatticeVectorFst

Does pruned composition of a lattice and a deterministic on demand FST.

Parameters:
Returns:

Output lattice.

Return type:

CompactLatticeVectorFst

kaldi.lat.functions.compute_compact_lattice_alphas(clat:CompactLatticeVectorFst) -> (success:bool, alpha:list<float>)

Computes the forward scores (alpha) for compact lattice states.

kaldi.lat.functions.compute_compact_lattice_betas(clat:CompactLatticeVectorFst) -> (success:bool, beta:list<float>)

Computes the backward scores (beta) for compact lattice states.

kaldi.lat.functions.compute_lattice_alphas_and_betas(lat, viterbi)[source]

Computes forward and backward scores for lattice states.

If viterbi == True, computes the Viterbi scores, i.e. forward (alpha) and backward (beta) scores are the scores of best paths reaching and leaving each state. Otherwise, computes regular forward and backward scores. Note that alphas and betas are negated costs. Requires the input lattice to be topologically sorted.

Parameters:
Returns:

The total-prob (or best-path prob), the forward (alpha) scores and the backward (beta) scores.

Return type:

Tuple[float, List[float], List[float]]

kaldi.lat.functions.convert_compact_lattice_to_phones(trans_model:TransitionModel, lat:CompactLatticeVectorFst)

Replaces transition-ids in compact lattice with phones.

Given a lattice, and a transition model to map pdf-ids to phones, replaces the sequences of transition-ids with sequences of phones. Note that this is different from convert_lattice_to_phones(), in that it replaces the transition-ids not the words.

kaldi.lat.functions.convert_lattice_to_phones(trans_model:TransitionModel, lat:LatticeVectorFst)

Replaces output symbols in lattice with phones.

Given a lattice, and a transition model to map pdf-ids to phones, replaces the output symbols (presumably words), with phones. Uses the trans_model to work out the phone sequence. Note that the phone labels are not exactly aligned with the phone boundaries. Inserted phone labels coincide with any transition to the final, nonemitting state of a phone (this state always exists). This would be the last transition-id in the phone if reordering is not done (but this is typically not the case).

kaldi.lat.functions.determinize_lattice_phone_pruned(ifst, trans_model, prune, opts=None, destructive=True)[source]

Applies a specialized determinization operation to a lattice.

Determinizes a raw state-level lattice, keeping only the best output-symbol sequence (typically transition ids) for each input-symbol sequence. This version does phone insertion when doing a first pass determinization (if opts.phone_determinize == True), it then removes the inserted phones and does a second pass determinization on the word lattice (if opts.word_determinize == True). It also does pruning as part of the determinization algorithm, which is more efficient and prevents blowup.

Parameters:
  • ifst (LatticeFst) – The input lattice.
  • trans_model (TransitionModel) – The transition model.
  • prune (float) – The pruning beam.
  • opts (DeterminizeLatticePhonePrunedOptions) – The options for lattice determinization.
  • destructive (bool) – Whether to use the destructive version of the algorithm which mutates input lattice.
Returns:

The output lattice.

Return type:

CompactLatticeVectorFst

Note

The point of doing first a phone-level determinization pass and then a word-level determinization pass is that it allows us to determinize deeper lattices without “failing early” and returning a too-small lattice due to the max-mem constraint. The result should be the same as word-level determinization in general, but for deeper lattices it is a bit faster, despite the fact that we now have two passes of determinization by default.

kaldi.lat.functions.determinize_lattice_pruned(ifst, prune, opts=None, compact_out=True)[source]

Applies a specialized determinization operation to a lattice.

Determinizes a raw state-level lattice, keeping only the best output-symbol sequence (typically transition ids) for each input-symbol sequence. This version does determinization only on the word lattice. The output is represented using either sequences of arcs (if compact_out == False), where all but the first one has an epsilon on the input side, or directly as strings using compact lattice weight type (if compact_out == True). It also does pruning as part of the determinization algorithm, which is more efficient and prevents blowup.

Parameters:
  • ifst (LatticeFst) – The input lattice.
  • prune (float) – The pruning beam.
  • opts (DeterminizeLatticePrunedOptions) – The options for lattice determinization.
  • compact_out (bool) – Whether to output a compact lattice.
Returns:

The output lattice.

Return type:

LatticeVectorFst or CompactLatticeVectorFst

kaldi.lat.functions.get_per_frame_acoustic_costs(linear_lattice:LatticeVectorFst) → Vector

Extracts per-frame log likelihoods from a linear lattice.

The size of output vector will be set to the number of non-epsilon input symbols in linear_lattice. The elements of output vector will be set to the second elements of the lattice weights, which represent the acoustic costs; you may want to scale this vector afterward by -1/acoustic_scale to get the original loglikes. If there are acoustic costs on input-epsilon arcs or the final-probs (and this should not normally be the case in situations where it makes sense to call this function), they will be included to the cost of the preceding input symbol, or the following input symbol for input-epsilons encountered prior to any input symbol. If linear_lattice has no input symbols, output vector will be set to the empty vector.

kaldi.lat.functions.lattice_active_phones(lat:LatticeVectorFst, trans:TransitionModel, sil_phones:list<int>) → list<set<int>>

Computes the set of phones active on each frame.

Given a lattice, and a transition model to map pdf-ids to phones, outputs for each frame the set of phones active on that frame. If sil_phones (which must be sorted and uniq) is non-empty, it excludes phones in this list.

kaldi.lat.functions.lattice_boost(trans:TransitionModel, alignment:list<int>, silence_phones:list<int>, b:float, max_silence_error:float, lat:LatticeVectorFst) → bool

Boosts graph scores in the lattice.

Boosts LM probabilities by b * [number of frame errors]; equivalently, adds -b*[number of frame errors] to the graph-component of the cost of each arc/path. There is a frame error if a particular transition-id on a particular frame corresponds to a phone not matching transcription’s alignment for that frame. This is used in “margin-inspired” discriminative training, esp. Boosted MMI. The transition model is used to map transition-ids in the lattice input-side to phones; the phones appearing in silence_phones are treated specially in that we replace the frame error f (either zero or 1) for a frame, with the minimum of f or max_silence_error. In the normal case, max_silence_error would be zero. Note that silence_phones must be sorted and unique.

Raises:ValueError – In case of failure.
kaldi.lat.functions.lattice_forward_backward(lat:LatticeVectorFst) -> (total_prob:float, arc_post:list<list<tuple<int, float>>>, acoustic_like_sum:float)

Computes lattice arc posteriors using forward-backward algorithm.

Returns the total log-probability of the lattice, arc posteriors and the sum over the arcs, of the posterior of the arc times the acoustic likelihood [i.e. negated acoustic score] on that arc. The arc posteriors contain pairs of (transition-id, weight) for each frame.

kaldi.lat.functions.lattice_forward_backward_mmi(trans:TransitionModel, lat:LatticeVectorFst, num_ali:list<int>, drop_frames:bool, convert_to_pdf_ids:bool, cancel:bool) -> (objf_val:float, post:list<list<tuple<int, float>>>)

Computes lattice posteriors for MMI.

Parameters:
  • trans (TransitionModel) – The transition model. Used to map the transition-ids to phones or pdfs.
  • lat (LatticeVectorFst) – The denominator lattice
  • num_ali (List[int]) – The numerator alignment
  • drop_frames (bool) – If True, it will not compute any posteriors on frames where the num and den have disjoint pdf-ids.
  • convert_to_pdf_ids (bool) – If True, it will convert the output to be at the level of pdf-ids, not transition-ids.
  • cancel (bool) – If true, it will cancel out any positive and negative parts from the same transition-id (or pdf-id, if convert_to_pdf_ids == True).
Returns:

The forward-backward likelihood of the lattice and the MMI posteriors for transition-ids (or pdf-ids if convert_to_pdf_ids == True) at each frame i.e. the difference between the numerator and denominator posteriors.

kaldi.lat.functions.lattice_forward_backward_mpe_variants(trans:TransitionModel, silence_phones:list<int>, lat:LatticeVectorFst, num_ali:list<int>, criterion:str, one_silence_class:bool) -> (objf_val:float, post:list<list<tuple<int, float>>>)

Computes lattice posteriors for MPFE (or SMBR).

This function computes either the MPFE (minimum phone frame error) or sMBR (state-level minimum bayes risk) forward-backward, depending on whether criterion is "mpfe" or "smbr".

Parameters:
  • trans (TransitionModel) – The transition model. Used to map the transition-ids to phones or pdfs.
  • silence_phones (List[int]) – A list of integer ids of silence phones. The silence frames i.e. the frames where num_ali corresponds to a silence phones are treated specially. The behavior is determined by ‘one_silence_class’ being false (traditional behavior) or true. Usually in our setup, several phones including the silence, vocalized noise, non-spoken noise and unk are treated as “silence phones”.
  • lat (LatticeVectorFst) – The denominator lattice.
  • num_ali (List[int]) – The numerator alignment.
  • criterion (str) – The objective function. Must be “mpfe” or “smbr” for MPFE (minimum phone frame error) or sMBR (state-level minimum bayes risk) training.
  • one_silence_class (bool) – Determines how the silence frames are treated. Setting this to false gives the old traditional behavior, where the silence frames (according to num_ali) are treated as incorrect. However, this means that the insertions are not penalized by the objective. Setting this to true gives the new behaviour, where we treat silence as any other phone, except that all pdfs of silence phones are collapsed into a single class for the frame-error computation. This can possible reduce the insertions in the trained model. This is closer to the WER metric that we actually care about, since WER is generally computed after filtering out noises, but does penalize insertions.
Returns:

The objective function value (MPFE or sMBR criterion) and the posteriors (which may be positive or negative).

kaldi.lat.functions.lattice_state_times(lat)[source]

Extracts lattice state times (in terms of frames).

Iterates over the states of a topologically sorted lattice and computes the corresponding time instances.

Parameters:lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice.
Returns:The number of frames and the state times.
Return type:Tuple[int, List[int]]

Note

If input is a regular lattice, the number of frames is equal to the maximum state time in the lattice. If input is a compact lattice, the number of frames might not be equal to the maximum state time in the lattice due to frames in final states.

kaldi.lat.functions.longest_sentence_length_in_lattice(lat)[source]

Returns the number of words in the longest sentence in a lattice.

Parameters:lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice.
Returns:The length of the longest sentence in the lattice.
Return type:int
kaldi.lat.functions.minimize_compact_lattice(clat:CompactLatticeMutableFst, delta:float=default) → bool

Applies a specialized minimization operation to compact lattices.

It is to be called after determinization and pushing. If the lattice is not determinized and pushed this function will not combine as many states as it could, but it won’t throw an exception. The output will be topologically sorted.

Returns:True on success, False if topological sorting fails.
kaldi.lat.functions.prune_lattice(beam, lat)[source]

Prunes a lattice.

Parameters:
Raises:

ValueError – If pruning fails.

kaldi.lat.functions.push_compact_lattice_strings(clat:CompactLatticeMutableFst) → bool

Pushes the transition-ids as far towards the start as they will go.

It can be useful prior to word_align_lattice() (for non-linear lattices). We can’t use the generic OpenFst “push” function because it uses the sum as the divisor, which is not appropriate in this case (a+b generally won’t divide a or b in this semiring).

Returns:True on success, False if topological sorting fails.
kaldi.lat.functions.push_compact_lattice_weights(clat:CompactLatticeMutableFst) → bool

Pushes the weights in compact lattice toward the start state.

This function pushes the weights in the compact lattice so that all states except possibly the start state, have weight components (of type LatticeWeight) that “sum to one” in the lattice semiring (i.e. interpreting the weights as negated log-probs).

Returns:True on success, False if topological sorting fails.
kaldi.lat.functions.rescore_compact_lattice_speedup(tmodel:TransitionModel, speedup_factor:float, decodable:DecodableInterface, clat:CompactLatticeVectorFst) → bool

Adjusts acoustic scores in the compact lattice.

This function is like rescore_lattice(), but it avoids computing probabilities on most frames where all the pdf-ids are the same. It needs the transition-model to work out whether two transition-ids map to the same pdf-id, and it assumes that the lattice has transition-ids on it. The naive thing would be to just set all probabilities to zero on frames where all the pdf-ids are the same (because this value won’t affect the lattice posterior). But this would become confusing when we compute corpus-level diagnostics such as the MMI objective function. Instead, for speedup_factor = 100 (must be >= 1.0), with probability 1.0 / speedup_factor we compute those likelihoods and multiply them by speedup_factor; otherwise we set them to zero. This gives the right expected probability so our corpus-level diagnostics will be about right.

kaldi.lat.functions.rescore_lattice(decodable, lat)[source]

Adjusts acoustic scores in the lattice.

This function adds the negated scores obtained from the decodable object, to the acoustic scores on the arcs. If you want to replace them, you should use scale_compact_lattice() to first set the acoustic scores to zero. The input labels (or the string component of arc weights if the input is a compact lattice), are interpreted as transition-ids or whatever other index the decodable object expects.

Parameters:
Raises:

ValueError – If the inputs are not compatible.

kaldi.lat.functions.sentence_level_confidence(lat)[source]

Computes sentence level confidence scores.

If input is a compact lattice, this function requires that distinct paths in lat have distinct word sequences; this will automatically be the case if lat was generated by a decoder, since a deterministic FST has this property. If input is a state-level lattice, it is first determinized, but this is done in a “smart” way so that only paths needed for this operation are generated.

This function assumes that any acoustic scaling you want to apply, has already been applied.

The output consists of the following. confidence is the score difference between the best path and the second-best path in the lattice (a positive number), or zero if lattice was equivalent to the empty FST (no successful paths), or infinity if there was only one path in the lattice. num_paths is a number in {0, 1, 2} saying how many n-best paths (up to two) were found. If num_paths >= 1, best_sentence is the best word-sequence; if num_paths -= 2, second_best_sentence is the second best word-sequence (this may be useful for testing whether the two best word sequences are somehow equivalent for the task at hand).

Args
lat (LatticeVectorFst or CompactLatticeVectorFst): The input lattice.
Returns:The tuple (confidence, num_paths, best_sentence, second_best_sentence).
Return type:Tuple[float, int, List[int], List[int]]

Note

This function is not the only way to get confidences in Kaldi. This only gives you sentence-level (utterance-level) confidence. You can get word-by-word confidence within a sentence, along with Minimum Bayes Risk decoding. Also confidences estimated using this function are not very accurate.

kaldi.lat.functions.top_sort_lattice_if_needed(lat)[source]

Topologically sorts the lattice if it is not already sorted.

Parameters:lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice.
Raises:RuntimeError – If lattice cannot be topologically sorted.

kaldi.lat.sausages

Classes

MinimumBayesRisk Minimum Bayes Risk decoding.
MinimumBayesRiskOptions Options for Minimum Bayes Risk decoding.
class kaldi.lat.sausages.MinimumBayesRisk(clat, opts=MinimumBayesRiskOptions())

Minimum Bayes Risk decoding.

This class does the word-level Minimum Bayes Risk computation, and gives you either the 1-best MBR output together with the expected Bayes Risk, or a sausage-like structure. Initial 1-best is set to lattice 1-best.

Parameters:
get_bayes_risk() → float

Returns the expected WER over this sentence.

get_one_best() → list<int>

Returns one-best output (with no epsilons).

get_one_best_confidences() → list<float>

Returns the confidences for the one-best output.

get_one_best_times() → list<tuple<float, float>>

Returns average (start, end) times for bins of the one-best output.

This is just the appopriate subsequence of times output by get_sausage_times().

get_sausage_stats() → list<list<tuple<int, float>>>

Returns the sausage statistics.

get_sausage_times() → list<tuple<float, float>>

Returns average (start, end) times for each bin.

new_with_words(clat:CompactLatticeVectorFst, words:list<int>, opts:MinimumBayesRiskOptions=default) → MinimumBayesRisk

Creates an instance using words as the initial 1-best.

Parameters:
new_with_words_times(clat:CompactLatticeVectorFst, words:list<int>, times:list<tuple<float, float>>, opts:MinimumBayesRiskOptions=default) → MinimumBayesRisk

Creates an instance using words and times as the initial 1-best.

Parameters:
class kaldi.lat.sausages.MinimumBayesRiskOptions

Options for Minimum Bayes Risk decoding.

decode_mbr

Whether to output MBR hypothesis.

print_silence

Whether the 1-best path will “keep” <eps> bins.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.