kaldi.chain

Functions

add_weight_to_supervision_fst Adds weights to supervision FST by composing it with normalization FST.
alignment_to_proto_supervision Creates a proto supervision from lists of phones and durations.
alignment_to_proto_supervision_with_phones_durs Creates a proto supervision from a list of (phone, duration) pairs.
compute_chain_objf_and_deriv Does both the numerator and denominator parts of the chain computation in one call.
compute_fst_state_times Computes the times for FST states.
convert_supervision_to_unconstrained Converts supervision to an unconstrained supervision.
create_denominator_fst Creates denominator graph.
get_weights_for_ranges This function gets the weights for the derivatives.
map_fst_to_pdf_ids_plus_one Converts transition-ids in input FST to pdf-ids plus one.
minimize_acceptor_no_push Minimizes acceptor without without weight pushing.
phone_lattice_to_proto_supervision Creates a proto supervision from a phone-aligned phone lattice.
proto_supervision_to_supervision Creates a Supervision object from a ProtoSupervision object.
sort_breadth_first_search Sorts the states of the Fst in a breadth-first search order.
split_into_ranges Pseudo-randomly split a sequence of length num_frames into pieces of length exactly frames_per_range
training_graph_to_supervision_e2e Creates and initializes an end-to-end supervision object from training FST.

Classes

ChainTrainingOptions Options for chain training.
DenominatorComputation Denominator computer used in chain training.
DenominatorGraph Denominator graph.
DenominatorGraphTransition Denominator graph transition.
GenericNumeratorComputation Numerator computer used in end-to-end chain training.
LanguageModelEstimator Language model estimator.
LanguageModelOptions Options for language model estimation.
NumeratorComputation Numerator computer used in chain training.
ProtoSupervision Proto supervision that is compiled into supervision.
Supervision Supervision information.
SupervisionOptions Supervision options.
SupervisionSplitter Supervision splitter.
TimeEnforcerFst Deterministic on-demand FST to limit the frames each phone is allowed.
class kaldi.chain.ChainTrainingOptions

Options for chain training.

l2_regularize

L2 regularization constant on the ‘chain’ output (default=0.0).

leaky_hmm_coefficient

Coefficient for ‘Leaky HMM’ (default=1.0e-05).

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
xent_regularize

Cross-entropy regularization constant (default=0.0).

class kaldi.chain.DenominatorComputation

Denominator computer used in chain training.

This does forward-backward in parallel on a number of sequences, using a single HMM.

Parameters:
  • opts (ChainTrainingOptions) – Options for chain training
  • den_graph (DenominatorGraph) – The HMM to use for denominator
  • num_sequences (int) – Number of separate time sequences to work with
  • nnet_output (CuMatrix) – The output of the neural network for this minibatch
backward(deriv_weight:float, nnet_output_deriv:CuMatrixBase) → bool

Does the backward computation.

Parameters:
  • deriv_weight (float) – Weight for the derivative of the log-prob wrt nnet output.
  • nnet_output_deriv (CuMatrix) – Output matrix.
Returns:

False if a failure is detected, True otherwise.

Return type:

bool

forward() → float

Does the forward computation.

Returns:Total negated log-likelihood summed over all sequences.
Return type:float
class kaldi.chain.DenominatorGraph

Denominator graph.

This class is responsible for storing the FST that we use as the ‘anti-model’ or ‘denominator-model’, that models all possible phone sequences (or most possible phone sequences, depending how we built it). It stores the FST in a format where we can access both the transitions out of each state, and the transitions into each state.

Parameters:
  • fst (StdVectorFst) – The denominator model FST. It should be an epsilon-free acceptor with labels representing pdf-ids + 1.
  • num_pds (int) – The number of PDFs. Used only for checking.

Note

Supports both GPU and non-GPU operation, but is optimized for GPU.

get_normalization_fst(ifst:StdVectorFst) → StdVectorFst

Outputs the normalization FST.

initial_probs() → CuVector

Returns the initial probabilities of HMM states.

num_pdfs() → int

Returns number of PDFs in the HMM.

num_states() → int

Returns number of states in the HMM.

scale_initial_probs(s:float)

Scales initial probabilities of HMM states.

class kaldi.chain.DenominatorGraphTransition

Denominator graph transition.

hmm_state

HMM state

pdf_id

PDF id

transition_prob

Transition probability

class kaldi.chain.GenericNumeratorComputation

Numerator computer used in end-to-end chain training.

This does forward-backward of the end-to-end ‘supervision’ (numerator) FSTs. This kind of FST can have self-loops.

Parameters:
  • supervision (Supervision) – Supervision for this minibatch
  • nnet_output (CuMatrix) – The output of the neural network for this minibatch
compute_objf() → float

Computes the objective function.

forward_backward(total_loglike:float, nnet_output_deriv:CuMatrixBase) → bool

Does the forward-backward computation.

class kaldi.chain.LanguageModelEstimator(opts:LanguageModelOptions)

Language model estimator.

This estimates an n-gram language model with a kind of ‘hard’ backoff that is intended to reduce the number of arcs in the final compiled FST. Basically, we never back off to the lower-order n-gram state, but we sometimes just say, “this state’s count is too small so we won’t have this state at all”, and this LM state disappears and transitions to it go to the lower-order n-gram’s state.

This language model is implemented as a set of states, and transitions between these states; there is no concept of a backoff transition here. Because this maps very naturally to an FST, we output it as an FST.

Parameters:opts (LanguageModelOptions) – Options for Language model estimation.
add_counts(sentence:list<int>)

Adds counts for input sentence.

Parameters:sentence (List[int]) – Input sentence. It should not contain zeros.
estimate() → StdVectorFst

Estimates the LM.

Returns:Output LM as an FST.
Return type:StdVectorFst
class kaldi.chain.LanguageModelOptions

Options for language model estimation.

These options are for an un-smoothed (phonetic) language model of a certain order (e.g. triphone) used as the ‘denominator graph’ in acoustic model estimation. The reason for avoiding smoothing is to reduce the number of transitions in the language model, which will improve the efficiency of training.

ngram_order

n-gram order for the (phonetic) language model

no_prune_ngram_order

The n-gram order below which the language model is not pruned

num_extra_lm_states

Desired number of extra LM states to keep for long n-grams

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
class kaldi.chain.NumeratorComputation

Numerator computer used in chain training.

This does forward-backward of the ‘supervision’ (numerator) FSTs.

Parameters:
  • supervision (Supervision) – Supervision for this minibatch
  • nnet_output (CuMatrix) – The output of the neural network for this minibatch
backward(nnet_output_deriv:CuMatrixBase)

Does the backward computation.

Parameters:nnet_output_deriv (CuMatrix) – Output matrix.
forward() → float

Does the forward computation.

Returns:Total log-prob multiplied by supervision.weight.
Return type:float
class kaldi.chain.ProtoSupervision

Proto supervision that is compiled into supervision.

allowed_phones

Phones allowed at each frame.

fst

The FST of phones; an epsilon-free acceptor.

write(os:ostream, binary:bool)

Writes to output stream for debugging.

class kaldi.chain.Supervision

Supervision information.

Fully-processed supervision information for a whole utterance or (after splitting) part of an utterance. It contains the time limits on phones encoded into the FST.

check(trans_model:TransitionModel)

Checks if some of the expected properties are satisfied.

frames_per_sequence

Number of frames per sequence (deafult=-1).

from_other(other:Supervision) → Supervision

Creates a new Supervision object from another.

fst

Supervision FST.

label_dim

Maximum possible value of the labels in fst (default=-1).

num_sequences

Number of sequences (default=1).

read(is:istream, binary:bool)

Reads Supervision object from input stream.

swap(other:Supervision)

Swaps contents with another Supervision object.

weight

Weight of this example (default=1.0).

write(os:ostream, binary:bool)

Writes Supervision object to output stream.

class kaldi.chain.SupervisionOptions

Supervision options.

check()

Checks if options are valid.

convert_to_pdfs

Convert transition-ids to pdf-ids + 1 in supervision FST

frame_subsampling_factor

Frame subsampling factor

left_tolerance

Left tolerance for shift in phone position relative to the alignment

lm_scale

The scale on graph weights from phone lattice included in the supervision FST

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
right_tolerance

Right tolerance for shift in phone position relative to the alignment

weight

Supervision weight for training

class kaldi.chain.SupervisionSplitter

Supervision splitter.

This is used for splitting a Supervision objects into multiple Supervision objects corresponding to different frame-ranges.

Parameters:supervision (Supervision) – Input Supervision object.
get_frame_range(begin_frame:int, frames_per_sequence:int) → Supervision

Extracts the Supervision object for the given frame range.

class kaldi.chain.TimeEnforcerFst

Deterministic on-demand FST to limit the frames each phone is allowed.

This class wraps the vector of allowed phones for each frame to create a DeterministicOnDemandFst that we can compose with the decoding-graph FST to limit the frames on which these phones are allowed to appear. This FST also helps us convert the labels from transition-ids to (pdf-ids plus one), which is what we’ll be using in the forward-backward (it avoids the need to keep the transition model around).

Suppose the number of frames is T, then there will be T+1 states in this FST, numbered from 0 to T+1, where state 0 is initial and state T+1 is final. A transition is only allowed from state t to state t+1 with a particular transition-id as its ilabel, if the corresponding phone is listed in the ‘allowed_phones’ for that frame. The olabels are pdf-ids plus one.

Parameters:
  • trans_model (TransitionModel) – Transition model
  • convert_to_pdfs (bool) – If True, this FST will map from transition-id on the inpurt side to pdf-id plus one on the output side. Otherwise, both sides’ labels will be transition-id.
  • allowed_phones (List[List[int]]) – Phones allowed at each frame.
final(state:int) → TropicalWeight

Returns the final weight of the given state.

get_arc(s:int, ilabel:int) -> (success:bool, oarc:StdArc)

Creates an on demand arc and returns it.

Parameters:
  • s (int) – State index.
  • ilabel (int) – Arc label.
Returns:

The created arc.

start() → int

Returns the start state index.

kaldi.chain.add_weight_to_supervision_fst(normalization_fst:StdVectorFst, supervision:Supervision) → bool

Adds weights to supervision FST by composing it with normalization FST.

kaldi.chain.alignment_to_proto_supervision(opts:SupervisionOptions, phones:list<int>, durations:list<int>) -> (success:bool, proto_supervision:ProtoSupervision)

Creates a proto supervision from lists of phones and durations.

kaldi.chain.alignment_to_proto_supervision_with_phones_durs(opts:SupervisionOptions, phones_durs:list<tuple<int, int>>) -> (success:bool, proto_supervision:ProtoSupervision)

Creates a proto supervision from a list of (phone, duration) pairs.

kaldi.chain.compute_chain_objf_and_deriv(opts:ChainTrainingOptions, den_graph:DenominatorGraph, supervision:Supervision, nnet_output:CuMatrixBase, nnet_output_deriv:CuMatrixBase, xent_output_deriv:CuMatrix) -> (objf:float, l2_term:float, weight:float)

Does both the numerator and denominator parts of the chain computation in one call.

Parameters:
  • opts (ChainTrainingOptions) – Struct containing options
  • den_graph (DenominatorGraph) – The denominator graph, derived from denominator fst.
  • supervision (Supervision) – The supervision object containing the supervision paths and constrains
  • nnet_output (CuMatrixBase) – The output of the neural net; dimension must equal ((supervision.num_sequences * supervision.frames_per_sequence) by den_graph.num_pdfs)
  • nnet_output_deriv (CuMatrixBase) – The derivative of the objective function w.r.t. the neural net output
  • xent_output_deriv (CuMatrix) – If non-NULL, then the numerator part of the derivative (equals the posterior from the numerator forward-backward, scaled by the supervision weight)
Returns:

The [num -den] objective function computed for this example l2_term (float): The l2 regularization term in the objective function weight (float): The weight to normalize the objective function by

Return type:

objf (float)

kaldi.chain.compute_fst_state_times(fst:StdVectorFst) -> (path_length:int, state_times:list<int>)

Computes the times for FST states.

Assuming the ‘fst’ is epsilon-free, connected, and has the property that all paths from the start-state are of the same length, output a vector containing that length (from the start-state to the current state) to ‘state_times’. The member ‘fst’ of struct Supervision has this property.

Similar to lattice_state_times and compact_lattice_state_times, except that it does not allow epsilons– not because they are hard to handle but because in this context we don’t expect them. This function also expects that the input fst will have the property that the state times are in nondecreasing order (as sort_breadth_first_search will accomplish for FSTs satsifying the other properties we mentioned). This just happens to be something we enforce while creating these FSTs.

Parameters:fst (StdVectorFst) – Input fst; epsilon-free; connected; nonempty; should have the property that all paths to a given state should have the same number of arcs; and states should be sorted on this path length
Returns:The path length and the state times.
kaldi.chain.convert_supervision_to_unconstrained(trans_model:TransitionModel, supervision:Supervision) → bool

Converts supervision to an unconstrained supervision.

This function converts a ‘Supervision’ object that has a non-cyclic FST as its ‘fst’ member, and converts it to one that has a cyclic FST in its e2e_fsts[0], and has ‘alignment_pdfs’ set to a random path through the original ‘fst’ (this used only in the binary nnet3-chain-acc-lda-stats). This can be used to train without any constraints on the alignment of phones internal to chunks, while still imposing constraints at chunk boundaries.

kaldi.chain.create_denominator_fst(ctx_dep:ContextDependency, trans_model:TransitionModel, phone_lm:StdVectorFst) → StdVectorFst

Creates denominator graph.

Starting from an acceptor on phones that represents some kind of compiled language model (with no disambiguation symbols), this funtion creates the denominator-graph.

kaldi.chain.get_weights_for_ranges(range_length:int, range_starts:list<int>) → list<Vector>

This function gets the weights for the derivatives.

Parameters:
  • range_starts (list of python:int) – obtained from split_into_ranges()
  • range_length (int) – length in frames (maybe longer that the one supplied in split_into_ranges)
Returns:

Output vector weights with the same dimension as range_starts.

kaldi.chain.map_fst_to_pdf_ids_plus_one(trans_model:TransitionModel, fst:StdVectorFst)

Converts transition-ids in input FST to pdf-ids plus one.

kaldi.chain.minimize_acceptor_no_push(fst:StdVectorFst)

Minimizes acceptor without without weight pushing.

This is useful for constructing denominator graph.

kaldi.chain.phone_lattice_to_proto_supervision(opts:SupervisionOptions, clat:CompactLatticeVectorFst) -> (success:bool, proto_supervision:ProtoSupervision)

Creates a proto supervision from a phone-aligned phone lattice.

kaldi.chain.proto_supervision_to_supervision(ctx_dep:ContextDependencyInterface, trans_model:TransitionModel, proto_supervision:ProtoSupervision, convert_to_pdfs:bool) -> (success:bool, supervision:Supervision)

Creates a Supervision object from a ProtoSupervision object.

Sorts the states of the Fst in a breadth-first search order.

kaldi.chain.split_into_ranges(num_frames:int, frames_per_range:int) → list<int>

Pseudo-randomly split a sequence of length num_frames into pieces of length exactly frames_per_range

kaldi.chain.training_graph_to_supervision_e2e(training_graph:StdVectorFst, trans_model:TransitionModel, num_frames:int) -> (success:bool, supervision:Supervision)

Creates and initializes an end-to-end supervision object from training FST.