kaldi.lat¶
kaldi.lat.align¶
Functions
phone_align_lattice |
Aligns the phone labels and transition-ids. |
read_lexicon_for_word_align |
Reads the lexicon in the special format required for word alignment. |
test_word_aligned_lattice |
Verifies the output of word_align_lattice . |
word_align_lattice |
Aligns the word labels and transition-ids. |
word_align_lattice_lexicon |
Aligns the word labels and transition-ids using a lexicon. |
Classes
PhoneAlignLatticeOptions |
Options for phone alignment. |
WordAlignLatticeLexiconInfo |
This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use. |
WordAlignLatticeLexiconOpts |
Options for word alignment using a lexicon. |
WordBoundaryInfo |
Word boundary information. |
WordBoundaryInfoNewOpts |
Options for word alignment using word boundary phones. |
-
class
kaldi.lat.align.
PhoneAlignLatticeOptions
¶ Options for phone alignment.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
remove_epsilon
¶ Whether to remove epsilon arcs from the phone lattice.
If
replace_output_symbols
isFalse
, this will mean that an arc can have multiple phones on it.
-
reorder
¶ Whether lattice was created from a graph with reorder option set.
-
replace_output_symbols
¶ Whether to replace output symbols (typically words) with phones.
-
-
class
kaldi.lat.align.
WordAlignLatticeLexiconInfo
(lexicon)¶ This class extracts some information from the lexicon and stores it in a suitable form for the word-alignment code to use.
Parameters: lexicon (List[List[int]]) – The lexicon. -
equivalence_class_of
(word:int) → int¶ Returns the equivalence class for the word.
This function is used in testing code.
Words are mapped into equivalence classes derived from the mappings in the first two fields of each line in the lexicon. This function maps from each word-id to the lowest member of its equivalence class.
-
is_valid_entry
(entry:list<int>) → bool¶ Checks if entry is valid.
This function is used in testing code.
Returns: True if the entry intepreted as (output-word phone1 phone2 …) can appear in the lexicon.
-
-
class
kaldi.lat.align.
WordAlignLatticeLexiconOpts
¶ Options for word alignment using a lexicon.
-
allow_duplicate_paths
¶ Whether to allow duplicate paths in testing code.
-
max_expand
¶ Maximum allowed ratio of #states in aligned lattice vs input lattice.
If >0.0, the maximum ratio by which we allow the lattice-alignment code to increase the #states in a lattice (vs. the phone-aligned lattice) before we fail and refuse to align the lattice. This is helpful in order to prevent ‘pathological’ lattices from causing the program to exhaust memory. Actual max-states is 1000 + max-expand * orig-num-states.
-
partial_word_label
¶ Label for partial word arcs at the end of “forced-out” utterances.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
reorder
¶ Whether lattice was created from a graph with reorder option set.
-
test
¶ Whether to activate the testing code to validate the algorithm.
-
-
class
kaldi.lat.align.
WordBoundaryInfo
(opts)¶ Word boundary information.
Parameters: opts (WordBoundaryInfoNewOpts) – Decoder options. -
PhoneType
¶ alias of
WordBoundaryInfo.PhoneType
-
from_file
(opts:WordBoundaryInfoNewOpts, word_boundary_file:str) → WordBoundaryInfo¶ Creates a new
WordBoundaryInfo
object from file.
-
init
(is:istream)¶ Initializes with information read from an input stream.
-
partial_word_label
¶ Label for partial word arcs at the end of “forced-out” utterances.
-
phone_to_type
¶ Mapping from phone ids to phone types.
-
reorder
¶ Whether lattice was created from a graph with reorder option set.
-
silence_label
¶ Label for silence arcs.
-
-
class
kaldi.lat.align.
WordBoundaryInfoNewOpts
¶ Options for word alignment using word boundary phones.
-
partial_word_label
¶ Label for partial word arcs at the end of “forced-out” utterances.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
reorder
¶ Whether lattice was created from a graph with reorder option set.
-
silence_label
¶ Label for silence arcs.
-
-
kaldi.lat.align.
phone_align_lattice
(lat, tmodel, opts)[source]¶ Aligns the phone labels and transition-ids.
Outputs a lattice in which the arcs correspond exactly to sequences of phones, so the boundaries between the arcs correspond to the boundaries between phones.
Parameters: - lat (CompactLatticeVectorFst) – The input lattice.
- tmodel (TransitionModel) – The transition model.
- opts (PhoneAlignLatticeOptions) – The phone alignment options.
Returns: A tuple representing the return value and the output lattice. The return value is set to True if the operation was successful, False if some kind of problem was detected, e.g. transition-id sequences in the lattice were incompatible with the model.
Note
If this function returns False, it doesn’t mean the output lattice is necessarily bad. It might just be that the input lattice was “forced out” with partial words due to no final state being reached during decoding, and in this case the output might still be usable.
Note
If
opts.remove_epsilon == True
andopts.replace_output_symbols == False
, an arc may have >1 phone on it, but the boundaries will still correspond with the boundaries between phones.Note
If
opts.replace_output_symbols == False
, it is possible to have arcs with words on them but no transition-ids at all.
-
kaldi.lat.align.
read_lexicon_for_word_align
(rxfilename)[source]¶ Reads the lexicon in the special format required for word alignment.
Each line has a series of integers on it (at least two on each line), representing:
<old-word-id> <new-word-id> [<phone-id-1> [<phone-id-2> … ] ]
Here, <old-word-id> is the word-id that appears in the lattice before alignment, and <new-word-id> is the word-is that should appear in the lattice after alignment. This is mainly useful when the lattice may have no symbol for the optional-silence arcs (so <old-word-id> would equal zero), but we want it to be output with a symbol on those arcs (so <new-word-id> would be nonzero). If the silence should not be added to the lattice, both <old-word-id> and <new-word-id> may be zero.
Parameters: rxfilename (str) – Extended filename for reading the lexicon. - Returns
- List[List[int]]: The lexicon in the format required for word alignment.
Raises: ValueError
– If reading the lexicon fails.
-
kaldi.lat.align.
test_word_aligned_lattice
(lat:CompactLatticeVectorFst, tmodel:TransitionModel, info:WordBoundaryInfo, aligned_lat:CompactLatticeVectorFst)¶ Verifies the output of
word_align_lattice
.Parameters: - lat (CompactLatticeVectorFst) – The input lattice.
- tmodel (TransitionModel) – The transition model.
- info (WordBoundaryInfo) – The word boundary information.
- aligned_lat (CompactLatticeVectorFst) – The word-aligned lattice.
Raises: RuntimeError
– If verification fails.
-
kaldi.lat.align.
word_align_lattice
(lat, tmodel, info, max_states)[source]¶ Aligns the word labels and transition-ids.
Aligns compact lattice so that each arc has the transition-ids on it that correspond to the word that is on that arc. It is OK for the lattice to have epsilon arcs for optional silences.
Parameters: - lat (CompactLatticeVectorFst) – The input lattice.
- tmodel (TransitionModel) – The transition model.
- info (WordBoundaryInfo) – The word boundary information.
- max_states (int) – Maximum #states allowed in the output lattice. If
max_states > 0
and the #states of the output will be greater thanmax_states
, this function will abort the computation, return False and output an empty lattice.
Returns: A tuple representing the return value and the output lattice. The return value is set to True if the operation was successful, False if some kind of problem was detected, e.g. transition-id sequences in the lattice were incompatible with the word boundary information.
Note
We don’t expect silence inside words, or empty words (words with no phones), and we expect the word to start with a wbegin_phone, to end with a wend_phone, and to possibly have winternal_phones inside (or to consist of just one wbegin_and_end_phone).
Note
If this function returns False, it doesn’t mean the output lattice is necessarily bad. It might just be that the input lattice was “forced out” with partial words due to no final state being reached during decoding, and in this case the output might still be usable.
-
kaldi.lat.align.
word_align_lattice_lexicon
(lat, tmodel, lexicon_info, opts)[source]¶ Aligns the word labels and transition-ids using a lexicon.
Aligns compact lattice so that each arc has the transition-ids on it that correspond to the word that is on that arc. It is OK for the lattice to have epsilon arcs for optional silences.
Parameters: - lat (CompactLatticeVectorFst) – The input lattice.
- tmodel (TransitionModel) – The transition model.
- lexicon_info (WordAlignLatticeLexiconInfo) – The lexicon information.
- opts (WordAlignLatticeLexiconOpts) – The word alignment options.
Returns: A tuple representing the return value and the output lattice. The return value is set to True if the operation was successful, False if some kind of problem was detected, e.g. transition-id sequences in the lattice were incompatible with the lexicon information.
Note
If this function returns False, it doesn’t mean the output lattice is necessarily bad. It might just be that the input lattice was “forced out” with partial words due to no final state being reached during decoding, and in this case the output might still be usable.
kaldi.lat.functions¶
Functions
add_word_ins_pen_to_compact_lattice |
Adds the penalty term to the graph scores of arcs in the lattice. |
compact_lattice_depth |
Computes the depth of the compact lattice. |
compact_lattice_depth_per_frame |
Computes the per-frame depth of the compact lattice. |
compact_lattice_limit_depth |
Limits the depth of the compact lattice. |
compact_lattice_shortest_path |
Computes the shortest path in an acyclic compact lattice. |
compact_lattice_to_word_alignment |
Extracts word alignment from a linear compact lattice. |
compact_lattice_to_word_prons |
Extracts word pronunciations from a linear compact lattice. |
compose_compact_lattice_deterministic |
Composes a compact lattice with a deterministic on-demand FST. |
compose_compact_lattice_pruned |
Does pruned composition of a lattice and a deterministic on demand FST. |
compute_compact_lattice_alphas |
Computes the forward scores (alpha) for compact lattice states. |
compute_compact_lattice_betas |
Computes the backward scores (beta) for compact lattice states. |
compute_lattice_alphas_and_betas |
Computes forward and backward scores for lattice states. |
convert_compact_lattice_to_phones |
Replaces transition-ids in compact lattice with phones. |
convert_lattice_to_phones |
Replaces output symbols in lattice with phones. |
determinize_lattice_phone_pruned |
Applies a specialized determinization operation to a lattice. |
determinize_lattice_pruned |
Applies a specialized determinization operation to a lattice. |
get_per_frame_acoustic_costs |
Extracts per-frame log likelihoods from a linear lattice. |
lattice_active_phones |
Computes the set of phones active on each frame. |
lattice_boost |
Boosts graph scores in the lattice. |
lattice_forward_backward |
Computes lattice arc posteriors using forward-backward algorithm. |
lattice_forward_backward_mmi |
Computes lattice posteriors for MMI. |
lattice_forward_backward_mpe_variants |
Computes lattice posteriors for MPFE (or SMBR). |
lattice_state_times |
Extracts lattice state times (in terms of frames). |
longest_sentence_length_in_lattice |
Returns the number of words in the longest sentence in a lattice. |
minimize_compact_lattice |
Applies a specialized minimization operation to compact lattices. |
prune_lattice |
Prunes a lattice. |
push_compact_lattice_strings |
Pushes the transition-ids as far towards the start as they will go. |
push_compact_lattice_weights |
Pushes the weights in compact lattice toward the start state. |
rescore_compact_lattice_speedup |
Adjusts acoustic scores in the compact lattice. |
rescore_lattice |
Adjusts acoustic scores in the lattice. |
sentence_level_confidence |
Computes sentence level confidence scores. |
top_sort_lattice_if_needed |
Topologically sorts the lattice if it is not already sorted. |
Classes
ComposeLatticePrunedOptions |
Options for pruned lattice composition. |
DeterminizeLatticePhonePrunedOptions |
Options for pruning and phone+word determinizing a lattice. |
DeterminizeLatticePrunedOptions |
Options for pruning and word determinizing a lattice. |
-
class
kaldi.lat.functions.
ComposeLatticePrunedOptions
¶ Options for pruned lattice composition.
-
growth_ratio
¶ Determines how much num-arcs can grow on each outer iteration (default=1.5).
-
initial_num_arcs
¶ Number of arcs used on the first outer iteration (default=100).
-
lattice_compose_beam
¶ Beam width explored during composition (default=6.0).
-
max_arcs
¶ Maximum number of arcs to expand (default=100000).
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
-
class
kaldi.lat.functions.
DeterminizeLatticePhonePrunedOptions
¶ Options for pruning and phone+word determinizing a lattice.
-
delta
¶ A small offset used to measure equality of weights.
-
max_mem
¶ Maximum memory threshold for the determinization operation.
If > 0, determinization will fail and return false when the algorithm’s (approximate) memory consumption crosses this threshold.
-
minimize
¶ Whether to push and minimize the output after determinization.
-
phone_determinize
¶ Whether to do a first pass determinization on both phones and words.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
word_determinize
¶ Whether to do a second pass determinization on words only.
-
-
class
kaldi.lat.functions.
DeterminizeLatticePrunedOptions
¶ Options for pruning and word determinizing a lattice.
-
delta
¶ A small offset used to measure equality of weights.
-
max_arcs
¶ Maximum number of arcs allowed in output FST.
-
max_loop
¶ Maximum loop threshold for the determinization operation.
If >0, can be used to detect non-determinizable input (a case that wouldn’t be caught by max_mem).
-
max_mem
¶ Maximum memory threshold for the determinization operation.
If > 0, determinization will fail and return false when the algorithm’s (approximate) memory consumption crosses this threshold.
-
max_states
¶ Maximum number of states allowed in output FST.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
retry_cutoff
¶ Cutoff value used when retrying a failed determinization operation.
Controls pruning un-determinized lattice and retrying determinization: if effective-beam < retry-cutoff * beam, we prune the raw lattice and retry. Avoids ever getting empty output for long segments.
-
-
kaldi.lat.functions.
add_word_ins_pen_to_compact_lattice
(word_ins_penalty:float, clat:CompactLatticeVectorFst)¶ Adds the penalty term to the graph scores of arcs in the lattice.
-
kaldi.lat.functions.
compact_lattice_depth
(clat:CompactLatticeVectorFst) -> (depth:float, num_frames:int)¶ Computes the depth of the compact lattice.
Returns the depth of the lattice, defined as the average number of arcs (or final-prob strings) crossing any given frame, and the number of frames. Lattice depth is
1
for empty lattices. Requires that the lattice is topologically sorted!
-
kaldi.lat.functions.
compact_lattice_depth_per_frame
(clat:CompactLatticeVectorFst) → list<int>¶ Computes the per-frame depth of the compact lattice.
Returns the per-frame depth of the lattice, defined as the number of arcs (or final-prob strings) crossing any given frame. Requires that the lattice is topologically sorted!
-
kaldi.lat.functions.
compact_lattice_limit_depth
(max_arcs_per_frame:int, clat:CompactLatticeVectorFst)¶ Limits the depth of the compact lattice.
Does not allow more than a specified number of arcs active on any given frame. This can be used to reduce the size of the “very deep” portions of the lattice.
-
kaldi.lat.functions.
compact_lattice_shortest_path
(clat:CompactLatticeVectorFst) → CompactLatticeVectorFst¶ Computes the shortest path in an acyclic compact lattice.
-
kaldi.lat.functions.
compact_lattice_to_word_alignment
(clat:CompactLatticeVectorFst) -> (success:bool, words:list<int>, begin_times:list<int>, lengths:list<int>)¶ Extracts word alignment from a linear compact lattice.
This function takes a compact lattice that should only contain a single linear sequence (e.g. output of
compact_lattice_shortest_path()
), and that should have been processed so that the arcs align correctly with the word boundaries (e.g. byword_align_lattice()
). It outputs 3 lists of the same size, which represent, for each word in the lattice (in sequence), the word label and the begin time and length in frames. This is done even for zero (epsilon) words, generally corresponding to optional silence – if you don’t want them, just ignore them in the output.Raises: ValueError
– If the lattice does not have the correct format (e.g. if- it is empty or if it is not linear).
-
kaldi.lat.functions.
compact_lattice_to_word_prons
(tmodel:TransitionModel, clat:CompactLatticeVectorFst) -> (success:bool, words:list<int>, begin_times:list<int>, lengths:list<int>, prons:list<list<int>>, phone_lengths:list<list<int>>)¶ Extracts word pronunciations from a linear compact lattice.
This function takes a compact lattice that should only contain a single linear sequence (e.g. output of
compact_lattice_shortest_path()
), and that should have been processed so that the arcs align correctly with the word boundaries (e.g. byword_align_lattice()
). It outputs 4 vectors of the same size, which represent, for each word in the lattice (in sequence), the word label, the begin time, length in frames, and the pronunciation (sequence of phones). This is done even for zero (epsilon) words, corresponding to optional silences – if you don’t want them, just ignore them in the output.Raises: ValueError
– If the lattice does not have the correct format (e.g. if- it is empty or if it is not linear).
-
kaldi.lat.functions.
compose_compact_lattice_deterministic
(clat:CompactLatticeVectorFst, det_fst:StdDeterministicOnDemandFst) → CompactLatticeVectorFst¶ Composes a compact lattice with a deterministic on-demand FST.
This function is used in language model rescoring. Composition affects only graph costs. The output is another compact lattice.
-
kaldi.lat.functions.
compose_compact_lattice_pruned
(opts:ComposeLatticePrunedOptions, clat:CompactLatticeVectorFst, det_fst:StdDeterministicOnDemandFst) → CompactLatticeVectorFst¶ Does pruned composition of a lattice and a deterministic on demand FST.
Parameters: - opts (ComposeLatticePrunedOptions) – Options for pruned lattice composition.
- clat (compactLatticeVectorFst) – Input lattice.
- det_fst (StdDeterministicOnDemandFst) – Input deterministic on demad FST.
Returns: Output lattice.
Return type:
-
kaldi.lat.functions.
compute_compact_lattice_alphas
(clat:CompactLatticeVectorFst) -> (success:bool, alpha:list<float>)¶ Computes the forward scores (alpha) for compact lattice states.
-
kaldi.lat.functions.
compute_compact_lattice_betas
(clat:CompactLatticeVectorFst) -> (success:bool, beta:list<float>)¶ Computes the backward scores (beta) for compact lattice states.
-
kaldi.lat.functions.
compute_lattice_alphas_and_betas
(lat, viterbi)[source]¶ Computes forward and backward scores for lattice states.
If
viterbi == True
, computes the Viterbi scores, i.e. forward (alpha) and backward (beta) scores are the scores of best paths reaching and leaving each state. Otherwise, computes regular forward and backward scores. Note that alphas and betas are negated costs. Requires the input lattice to be topologically sorted.Parameters: - lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice.
- viterbi (bool) – Whether to compute Viterbi scores.
Returns: The total-prob (or best-path prob), the forward (alpha) scores and the backward (beta) scores.
Return type:
-
kaldi.lat.functions.
convert_compact_lattice_to_phones
(trans_model:TransitionModel, lat:CompactLatticeVectorFst)¶ Replaces transition-ids in compact lattice with phones.
Given a lattice, and a transition model to map pdf-ids to phones, replaces the sequences of transition-ids with sequences of phones. Note that this is different from
convert_lattice_to_phones()
, in that it replaces the transition-ids not the words.
-
kaldi.lat.functions.
convert_lattice_to_phones
(trans_model:TransitionModel, lat:LatticeVectorFst)¶ Replaces output symbols in lattice with phones.
Given a lattice, and a transition model to map pdf-ids to phones, replaces the output symbols (presumably words), with phones. Uses the
trans_model
to work out the phone sequence. Note that the phone labels are not exactly aligned with the phone boundaries. Inserted phone labels coincide with any transition to the final, nonemitting state of a phone (this state always exists). This would be the last transition-id in the phone if reordering is not done (but this is typically not the case).See also
-
kaldi.lat.functions.
determinize_lattice_phone_pruned
(ifst, trans_model, prune, opts=None, destructive=True)[source]¶ Applies a specialized determinization operation to a lattice.
Determinizes a raw state-level lattice, keeping only the best output-symbol sequence (typically transition ids) for each input-symbol sequence. This version does phone insertion when doing a first pass determinization (if
opts.phone_determinize == True
), it then removes the inserted phones and does a second pass determinization on the word lattice (ifopts.word_determinize == True
). It also does pruning as part of the determinization algorithm, which is more efficient and prevents blowup.Parameters: - ifst (LatticeFst) – The input lattice.
- trans_model (TransitionModel) – The transition model.
- prune (float) – The pruning beam.
- opts (DeterminizeLatticePhonePrunedOptions) – The options for lattice determinization.
- destructive (bool) – Whether to use the destructive version of the algorithm which mutates input lattice.
Returns: The output lattice.
Return type: See also
Note
The point of doing first a phone-level determinization pass and then a word-level determinization pass is that it allows us to determinize deeper lattices without “failing early” and returning a too-small lattice due to the max-mem constraint. The result should be the same as word-level determinization in general, but for deeper lattices it is a bit faster, despite the fact that we now have two passes of determinization by default.
-
kaldi.lat.functions.
determinize_lattice_pruned
(ifst, prune, opts=None, compact_out=True)[source]¶ Applies a specialized determinization operation to a lattice.
Determinizes a raw state-level lattice, keeping only the best output-symbol sequence (typically transition ids) for each input-symbol sequence. This version does determinization only on the word lattice. The output is represented using either sequences of arcs (if
compact_out == False
), where all but the first one has an epsilon on the input side, or directly as strings using compact lattice weight type (ifcompact_out == True
). It also does pruning as part of the determinization algorithm, which is more efficient and prevents blowup.Parameters: - ifst (LatticeFst) – The input lattice.
- prune (float) – The pruning beam.
- opts (DeterminizeLatticePrunedOptions) – The options for lattice determinization.
- compact_out (bool) – Whether to output a compact lattice.
Returns: The output lattice.
Return type: See also
-
kaldi.lat.functions.
get_per_frame_acoustic_costs
(linear_lattice:LatticeVectorFst) → Vector¶ Extracts per-frame log likelihoods from a linear lattice.
The size of output vector will be set to the number of non-epsilon input symbols in
linear_lattice
. The elements of output vector will be set to the second elements of the lattice weights, which represent the acoustic costs; you may want to scale this vector afterward by -1/acoustic_scale to get the original loglikes. If there are acoustic costs on input-epsilon arcs or the final-probs (and this should not normally be the case in situations where it makes sense to call this function), they will be included to the cost of the preceding input symbol, or the following input symbol for input-epsilons encountered prior to any input symbol. Iflinear_lattice
has no input symbols, output vector will be set to the empty vector.
-
kaldi.lat.functions.
lattice_active_phones
(lat:LatticeVectorFst, trans:TransitionModel, sil_phones:list<int>) → list<set<int>>¶ Computes the set of phones active on each frame.
Given a lattice, and a transition model to map pdf-ids to phones, outputs for each frame the set of phones active on that frame. If
sil_phones
(which must be sorted and uniq) is non-empty, it excludes phones in this list.
-
kaldi.lat.functions.
lattice_boost
(trans:TransitionModel, alignment:list<int>, silence_phones:list<int>, b:float, max_silence_error:float, lat:LatticeVectorFst) → bool¶ Boosts graph scores in the lattice.
Boosts LM probabilities by b * [number of frame errors]; equivalently, adds -b*[number of frame errors] to the graph-component of the cost of each arc/path. There is a frame error if a particular transition-id on a particular frame corresponds to a phone not matching transcription’s alignment for that frame. This is used in “margin-inspired” discriminative training, esp. Boosted MMI. The transition model is used to map transition-ids in the lattice input-side to phones; the phones appearing in
silence_phones
are treated specially in that we replace the frame error f (either zero or 1) for a frame, with the minimum of f ormax_silence_error
. In the normal case,max_silence_error
would be zero. Note thatsilence_phones
must be sorted and unique.Raises: ValueError
– In case of failure.
-
kaldi.lat.functions.
lattice_forward_backward
(lat:LatticeVectorFst) -> (total_prob:float, arc_post:list<list<tuple<int, float>>>, acoustic_like_sum:float)¶ Computes lattice arc posteriors using forward-backward algorithm.
Returns the total log-probability of the lattice, arc posteriors and the sum over the arcs, of the posterior of the arc times the acoustic likelihood [i.e. negated acoustic score] on that arc. The arc posteriors contain pairs of
(transition-id, weight)
for each frame.
-
kaldi.lat.functions.
lattice_forward_backward_mmi
(trans:TransitionModel, lat:LatticeVectorFst, num_ali:list<int>, drop_frames:bool, convert_to_pdf_ids:bool, cancel:bool) -> (objf_val:float, post:list<list<tuple<int, float>>>)¶ Computes lattice posteriors for MMI.
Parameters: - trans (TransitionModel) – The transition model. Used to map the transition-ids to phones or pdfs.
- lat (LatticeVectorFst) – The denominator lattice
- num_ali (List[int]) – The numerator alignment
- drop_frames (bool) – If
True
, it will not compute any posteriors on frames where the num and den have disjoint pdf-ids. - convert_to_pdf_ids (bool) – If
True
, it will convert the output to be at the level of pdf-ids, not transition-ids. - cancel (bool) – If
true
, it will cancel out any positive and negative parts from the same transition-id (or pdf-id, ifconvert_to_pdf_ids == True
).
Returns: The forward-backward likelihood of the lattice and the MMI posteriors for transition-ids (or pdf-ids if
convert_to_pdf_ids == True
) at each frame i.e. the difference between the numerator and denominator posteriors.
-
kaldi.lat.functions.
lattice_forward_backward_mpe_variants
(trans:TransitionModel, silence_phones:list<int>, lat:LatticeVectorFst, num_ali:list<int>, criterion:str, one_silence_class:bool) -> (objf_val:float, post:list<list<tuple<int, float>>>)¶ Computes lattice posteriors for MPFE (or SMBR).
This function computes either the MPFE (minimum phone frame error) or sMBR (state-level minimum bayes risk) forward-backward, depending on whether
criterion
is"mpfe"
or"smbr"
.Parameters: - trans (TransitionModel) – The transition model. Used to map the transition-ids to phones or pdfs.
- silence_phones (List[int]) – A list of integer ids of silence phones. The silence frames i.e. the frames where num_ali corresponds to a silence phones are treated specially. The behavior is determined by ‘one_silence_class’ being false (traditional behavior) or true. Usually in our setup, several phones including the silence, vocalized noise, non-spoken noise and unk are treated as “silence phones”.
- lat (LatticeVectorFst) – The denominator lattice.
- num_ali (List[int]) – The numerator alignment.
- criterion (str) – The objective function. Must be “mpfe” or “smbr” for MPFE (minimum phone frame error) or sMBR (state-level minimum bayes risk) training.
- one_silence_class (bool) – Determines how the silence frames are treated. Setting this to false gives the old traditional behavior, where the silence frames (according to num_ali) are treated as incorrect. However, this means that the insertions are not penalized by the objective. Setting this to true gives the new behaviour, where we treat silence as any other phone, except that all pdfs of silence phones are collapsed into a single class for the frame-error computation. This can possible reduce the insertions in the trained model. This is closer to the WER metric that we actually care about, since WER is generally computed after filtering out noises, but does penalize insertions.
Returns: The objective function value (MPFE or sMBR criterion) and the posteriors (which may be positive or negative).
-
kaldi.lat.functions.
lattice_state_times
(lat)[source]¶ Extracts lattice state times (in terms of frames).
Iterates over the states of a topologically sorted lattice and computes the corresponding time instances.
Parameters: lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice. Returns: The number of frames and the state times. Return type: Tuple[int, List[int]] Note
If input is a regular lattice, the number of frames is equal to the maximum state time in the lattice. If input is a compact lattice, the number of frames might not be equal to the maximum state time in the lattice due to frames in final states.
-
kaldi.lat.functions.
longest_sentence_length_in_lattice
(lat)[source]¶ Returns the number of words in the longest sentence in a lattice.
Parameters: lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice. Returns: The length of the longest sentence in the lattice. Return type: int
-
kaldi.lat.functions.
minimize_compact_lattice
(clat:CompactLatticeMutableFst, delta:float=default) → bool¶ Applies a specialized minimization operation to compact lattices.
It is to be called after determinization and pushing. If the lattice is not determinized and pushed this function will not combine as many states as it could, but it won’t throw an exception. The output will be topologically sorted.
Returns: True on success, False if topological sorting fails.
-
kaldi.lat.functions.
prune_lattice
(beam, lat)[source]¶ Prunes a lattice.
Parameters: - beam (float) – The pruning beam.
- lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice.
Raises: ValueError
– If pruning fails.
-
kaldi.lat.functions.
push_compact_lattice_strings
(clat:CompactLatticeMutableFst) → bool¶ Pushes the transition-ids as far towards the start as they will go.
It can be useful prior to
word_align_lattice()
(for non-linear lattices). We can’t use the generic OpenFst “push” function because it uses the sum as the divisor, which is not appropriate in this case (a+b generally won’t divide a or b in this semiring).Returns: True on success, False if topological sorting fails.
-
kaldi.lat.functions.
push_compact_lattice_weights
(clat:CompactLatticeMutableFst) → bool¶ Pushes the weights in compact lattice toward the start state.
This function pushes the weights in the compact lattice so that all states except possibly the start state, have weight components (of type LatticeWeight) that “sum to one” in the lattice semiring (i.e. interpreting the weights as negated log-probs).
Returns: True on success, False if topological sorting fails.
-
kaldi.lat.functions.
rescore_compact_lattice_speedup
(tmodel:TransitionModel, speedup_factor:float, decodable:DecodableInterface, clat:CompactLatticeVectorFst) → bool¶ Adjusts acoustic scores in the compact lattice.
This function is like
rescore_lattice()
, but it avoids computing probabilities on most frames where all the pdf-ids are the same. It needs the transition-model to work out whether two transition-ids map to the same pdf-id, and it assumes that the lattice has transition-ids on it. The naive thing would be to just set all probabilities to zero on frames where all the pdf-ids are the same (because this value won’t affect the lattice posterior). But this would become confusing when we compute corpus-level diagnostics such as the MMI objective function. Instead, forspeedup_factor = 100
(must be >= 1.0), with probability1.0 / speedup_factor
we compute those likelihoods and multiply them byspeedup_factor
; otherwise we set them to zero. This gives the right expected probability so our corpus-level diagnostics will be about right.See also
-
kaldi.lat.functions.
rescore_lattice
(decodable, lat)[source]¶ Adjusts acoustic scores in the lattice.
This function adds the negated scores obtained from the decodable object, to the acoustic scores on the arcs. If you want to replace them, you should use
scale_compact_lattice()
to first set the acoustic scores to zero. The input labels (or the string component of arc weights if the input is a compact lattice), are interpreted as transition-ids or whatever other index the decodable object expects.Parameters: - decodable (DecodableInterface) – The decodable object.
- lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice.
Raises: ValueError
– If the inputs are not compatible.See also
-
kaldi.lat.functions.
sentence_level_confidence
(lat)[source]¶ Computes sentence level confidence scores.
If input is a compact lattice, this function requires that distinct paths in
lat
have distinct word sequences; this will automatically be the case iflat
was generated by a decoder, since a deterministic FST has this property. If input is a state-level lattice, it is first determinized, but this is done in a “smart” way so that only paths needed for this operation are generated.This function assumes that any acoustic scaling you want to apply, has already been applied.
The output consists of the following.
confidence
is the score difference between the best path and the second-best path in the lattice (a positive number), or zero if lattice was equivalent to the empty FST (no successful paths), or infinity if there was only one path in the lattice.num_paths
is a number in{0, 1, 2}
saying how many n-best paths (up to two) were found. Ifnum_paths >= 1
,best_sentence
is the best word-sequence; ifnum_paths -= 2
,second_best_sentence
is the second best word-sequence (this may be useful for testing whether the two best word sequences are somehow equivalent for the task at hand).- Args
- lat (LatticeVectorFst or CompactLatticeVectorFst): The input lattice.
Returns: The tuple (confidence, num_paths, best_sentence, second_best_sentence)
.Return type: Tuple[float, int, List[int], List[int]] Note
This function is not the only way to get confidences in Kaldi. This only gives you sentence-level (utterance-level) confidence. You can get word-by-word confidence within a sentence, along with Minimum Bayes Risk decoding. Also confidences estimated using this function are not very accurate.
-
kaldi.lat.functions.
top_sort_lattice_if_needed
(lat)[source]¶ Topologically sorts the lattice if it is not already sorted.
Parameters: lat (LatticeVectorFst or CompactLatticeVectorFst) – The input lattice. Raises: RuntimeError
– If lattice cannot be topologically sorted.
kaldi.lat.sausages¶
Classes
MinimumBayesRisk |
Minimum Bayes Risk decoding. |
MinimumBayesRiskOptions |
Options for Minimum Bayes Risk decoding. |
-
class
kaldi.lat.sausages.
MinimumBayesRisk
(clat, opts=MinimumBayesRiskOptions())¶ Minimum Bayes Risk decoding.
This class does the word-level Minimum Bayes Risk computation, and gives you either the 1-best MBR output together with the expected Bayes Risk, or a sausage-like structure. Initial 1-best is set to lattice 1-best.
Parameters: - clat (CompactLatticeVectorFst) – The input lattice.
- opts (MinimumBayesRiskOptions) – The MBR options.
-
get_bayes_risk
() → float¶ Returns the expected WER over this sentence.
-
get_one_best
() → list<int>¶ Returns one-best output (with no epsilons).
-
get_one_best_confidences
() → list<float>¶ Returns the confidences for the one-best output.
-
get_one_best_times
() → list<tuple<float, float>>¶ Returns average (start, end) times for bins of the one-best output.
This is just the appopriate subsequence of times output by
get_sausage_times()
.
-
get_sausage_stats
() → list<list<tuple<int, float>>>¶ Returns the sausage statistics.
-
get_sausage_times
() → list<tuple<float, float>>¶ Returns average (start, end) times for each bin.
-
new_with_words
(clat:CompactLatticeVectorFst, words:list<int>, opts:MinimumBayesRiskOptions=default) → MinimumBayesRisk¶ Creates an instance using
words
as the initial 1-best.Parameters: - clat (CompactLatticeVectorFst) – The input lattice.
- words (List[int]) – Initial best word sequence.
- opts (MinimumBayesRiskOptions) – The MBR options.
-
new_with_words_times
(clat:CompactLatticeVectorFst, words:list<int>, times:list<tuple<float, float>>, opts:MinimumBayesRiskOptions=default) → MinimumBayesRisk¶ Creates an instance using
words
andtimes
as the initial 1-best.Parameters: - clat (CompactLatticeVectorFst) – The input lattice.
- words (List[int]) – Initial best word sequence.
- times (List[Tuple[float, float]]) – Initial times for the bins.
- opts (MinimumBayesRiskOptions) – The MBR options.
-
class
kaldi.lat.sausages.
MinimumBayesRiskOptions
¶ Options for Minimum Bayes Risk decoding.
-
decode_mbr
¶ Whether to output MBR hypothesis.
-
print_silence
¶ Whether the 1-best path will “keep” <eps> bins.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-