kaldi.chain¶

Functions

`add_weight_to_supervision_fst`	Adds weights to supervision FST by composing it with normalization FST.
`alignment_to_proto_supervision`	Creates a proto supervision from lists of phones and durations.
`alignment_to_proto_supervision_with_phones_durs`	Creates a proto supervision from a list of (phone, duration) pairs.
`compute_chain_objf_and_deriv`	Does both the numerator and denominator parts of the chain computation in one call.
`compute_fst_state_times`	Computes the times for FST states.
`convert_supervision_to_unconstrained`	Converts supervision to an unconstrained supervision.
`create_denominator_fst`	Creates denominator graph.
`get_weights_for_ranges`	This function gets the weights for the derivatives.
`map_fst_to_pdf_ids_plus_one`	Converts transition-ids in input FST to pdf-ids plus one.
`minimize_acceptor_no_push`	Minimizes acceptor without without weight pushing.
`phone_lattice_to_proto_supervision`	Creates a proto supervision from a phone-aligned phone lattice.
`proto_supervision_to_supervision`	Creates a Supervision object from a ProtoSupervision object.
`sort_breadth_first_search`	Sorts the states of the Fst in a breadth-first search order.
`split_into_ranges`	Pseudo-randomly split a sequence of length num_frames into pieces of length exactly frames_per_range
`training_graph_to_supervision_e2e`	Creates and initializes an end-to-end supervision object from training FST.

Classes

`ChainTrainingOptions`	Options for chain training.
`DenominatorComputation`	Denominator computer used in chain training.
`DenominatorGraph`	Denominator graph.
`DenominatorGraphTransition`	Denominator graph transition.
`GenericNumeratorComputation`	Numerator computer used in end-to-end chain training.
`LanguageModelEstimator`	Language model estimator.
`LanguageModelOptions`	Options for language model estimation.
`NumeratorComputation`	Numerator computer used in chain training.
`ProtoSupervision`	Proto supervision that is compiled into supervision.
`Supervision`	Supervision information.
`SupervisionOptions`	Supervision options.
`SupervisionSplitter`	Supervision splitter.
`TimeEnforcerFst`	Deterministic on-demand FST to limit the frames each phone is allowed.

class kaldi.chain.ChainTrainingOptions¶

Options for chain training.

l2_regularize¶: L2 regularization constant on the ‘chain’ output (default=0.0).

leaky_hmm_coefficient¶: Coefficient for ‘Leaky HMM’ (default=1.0e-05).

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

xent_regularize¶: Cross-entropy regularization constant (default=0.0).

class kaldi.chain.DenominatorComputation¶

Denominator computer used in chain training.

This does forward-backward in parallel on a number of sequences, using a single HMM.

Parameters:	opts (ChainTrainingOptions) – Options for chain training den_graph (DenominatorGraph) – The HMM to use for denominator num_sequences (int) – Number of separate time sequences to work with nnet_output (CuMatrix) – The output of the neural network for this minibatch

backward(deriv_weight:float, nnet_output_deriv:CuMatrixBase) → bool¶

Does the backward computation.

Parameters:	deriv_weight (float) – Weight for the derivative of the log-prob wrt nnet output. nnet_output_deriv (CuMatrix) – Output matrix.
Returns:	False if a failure is detected, True otherwise.
Return type:	bool

forward() → float¶

Does the forward computation.

Returns:	Total negated log-likelihood summed over all sequences.
Return type:	float

class kaldi.chain.DenominatorGraph¶

Denominator graph.

This class is responsible for storing the FST that we use as the ‘anti-model’ or ‘denominator-model’, that models all possible phone sequences (or most possible phone sequences, depending how we built it). It stores the FST in a format where we can access both the transitions out of each state, and the transitions into each state.

Parameters:	fst (StdVectorFst) – The denominator model FST. It should be an epsilon-free acceptor with labels representing pdf-ids + 1. num_pds (int) – The number of PDFs. Used only for checking.

Note

Supports both GPU and non-GPU operation, but is optimized for GPU.

get_normalization_fst(ifst:StdVectorFst) → StdVectorFst¶: Outputs the normalization FST.

initial_probs() → CuVector¶: Returns the initial probabilities of HMM states.

num_pdfs() → int¶: Returns number of PDFs in the HMM.

num_states() → int¶: Returns number of states in the HMM.

scale_initial_probs(s:float)¶: Scales initial probabilities of HMM states.

class kaldi.chain.DenominatorGraphTransition¶

Denominator graph transition.

hmm_state¶: HMM state

pdf_id¶: PDF id

transition_prob¶: Transition probability

class kaldi.chain.GenericNumeratorComputation¶

Numerator computer used in end-to-end chain training.

This does forward-backward of the end-to-end ‘supervision’ (numerator) FSTs. This kind of FST can have self-loops.

Parameters:	supervision (Supervision) – Supervision for this minibatch nnet_output (CuMatrix) – The output of the neural network for this minibatch

compute_objf() → float¶: Computes the objective function.

forward_backward(total_loglike:float, nnet_output_deriv:CuMatrixBase) → bool¶: Does the forward-backward computation.

class kaldi.chain.LanguageModelEstimator(opts:LanguageModelOptions)¶

Language model estimator.

This estimates an n-gram language model with a kind of ‘hard’ backoff that is intended to reduce the number of arcs in the final compiled FST. Basically, we never back off to the lower-order n-gram state, but we sometimes just say, “this state’s count is too small so we won’t have this state at all”, and this LM state disappears and transitions to it go to the lower-order n-gram’s state.

This language model is implemented as a set of states, and transitions between these states; there is no concept of a backoff transition here. Because this maps very naturally to an FST, we output it as an FST.

Parameters:	opts (LanguageModelOptions) – Options for Language model estimation.

add_counts(sentence:list<int>)¶

Adds counts for input sentence.

Parameters:	sentence (List[int]) – Input sentence. It should not contain zeros.

estimate() → StdVectorFst¶

Estimates the LM.

Returns:	Output LM as an FST.
Return type:	StdVectorFst

class kaldi.chain.LanguageModelOptions¶

Options for language model estimation.

These options are for an un-smoothed (phonetic) language model of a certain order (e.g. triphone) used as the ‘denominator graph’ in acoustic model estimation. The reason for avoiding smoothing is to reduce the number of transitions in the language model, which will improve the efficiency of training.

ngram_order¶: n-gram order for the (phonetic) language model

no_prune_ngram_order¶: The n-gram order below which the language model is not pruned

num_extra_lm_states¶: Desired number of extra LM states to keep for long n-grams

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

class kaldi.chain.NumeratorComputation¶

Numerator computer used in chain training.

This does forward-backward of the ‘supervision’ (numerator) FSTs.

Parameters:	supervision (Supervision) – Supervision for this minibatch nnet_output (CuMatrix) – The output of the neural network for this minibatch

backward(nnet_output_deriv:CuMatrixBase)¶

Does the backward computation.

Parameters:	nnet_output_deriv (CuMatrix) – Output matrix.

forward() → float¶

Does the forward computation.

Returns:	Total log-prob multiplied by supervision.weight.
Return type:	float

class kaldi.chain.ProtoSupervision¶

Proto supervision that is compiled into supervision.

allowed_phones¶: Phones allowed at each frame.

fst¶: The FST of phones; an epsilon-free acceptor.

write(os:ostream, binary:bool)¶: Writes to output stream for debugging.

class kaldi.chain.Supervision¶

Supervision information.

Fully-processed supervision information for a whole utterance or (after splitting) part of an utterance. It contains the time limits on phones encoded into the FST.

check(trans_model:TransitionModel)¶: Checks if some of the expected properties are satisfied.

frames_per_sequence¶: Number of frames per sequence (deafult=-1).

from_other(other:Supervision) → Supervision¶: Creates a new Supervision object from another.

fst¶: Supervision FST.

label_dim¶: Maximum possible value of the labels in fst (default=-1).

num_sequences¶: Number of sequences (default=1).

read(is:istream, binary:bool)¶: Reads Supervision object from input stream.

swap(other:Supervision)¶: Swaps contents with another Supervision object.

weight¶: Weight of this example (default=1.0).

write(os:ostream, binary:bool)¶: Writes Supervision object to output stream.

class kaldi.chain.SupervisionOptions¶

Supervision options.

check()¶: Checks if options are valid.

convert_to_pdfs¶: Convert transition-ids to pdf-ids + 1 in supervision FST

frame_subsampling_factor¶: Frame subsampling factor

left_tolerance¶: Left tolerance for shift in phone position relative to the alignment

lm_scale¶: The scale on graph weights from phone lattice included in the supervision FST

register(opts:OptionsItf)¶

Registers options with an object implementing the options interface.

Parameters:	opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.

right_tolerance¶: Right tolerance for shift in phone position relative to the alignment

weight¶: Supervision weight for training

class kaldi.chain.SupervisionSplitter¶

Supervision splitter.

This is used for splitting a Supervision objects into multiple Supervision objects corresponding to different frame-ranges.

Parameters:	supervision (Supervision) – Input Supervision object.

get_frame_range(begin_frame:int, frames_per_sequence:int) → Supervision¶: Extracts the Supervision object for the given frame range.

class kaldi.chain.TimeEnforcerFst¶

Deterministic on-demand FST to limit the frames each phone is allowed.

This class wraps the vector of allowed phones for each frame to create a DeterministicOnDemandFst that we can compose with the decoding-graph FST to limit the frames on which these phones are allowed to appear. This FST also helps us convert the labels from transition-ids to (pdf-ids plus one), which is what we’ll be using in the forward-backward (it avoids the need to keep the transition model around).

Suppose the number of frames is T, then there will be T+1 states in this FST, numbered from 0 to T+1, where state 0 is initial and state T+1 is final. A transition is only allowed from state t to state t+1 with a particular transition-id as its ilabel, if the corresponding phone is listed in the ‘allowed_phones’ for that frame. The olabels are pdf-ids plus one.

Parameters:	trans_model (TransitionModel) – Transition model convert_to_pdfs (bool) – If True, this FST will map from transition-id on the inpurt side to pdf-id plus one on the output side. Otherwise, both sides’ labels will be transition-id. allowed_phones (List[List[int]]) – Phones allowed at each frame.

final(state:int) → TropicalWeight¶: Returns the final weight of the given state.

get_arc(s:int, ilabel:int) -> (success:bool, oarc:StdArc)¶

Creates an on demand arc and returns it.

Parameters:	s (int) – State index. ilabel (int) – Arc label.
Returns:	The created arc.

start() → int¶: Returns the start state index.

kaldi.chain.add_weight_to_supervision_fst(normalization_fst:StdVectorFst, supervision:Supervision) → bool¶: Adds weights to supervision FST by composing it with normalization FST.

kaldi.chain.alignment_to_proto_supervision(opts:SupervisionOptions, phones:list<int>, durations:list<int>) -> (success:bool, proto_supervision:ProtoSupervision)¶: Creates a proto supervision from lists of phones and durations.

kaldi.chain.alignment_to_proto_supervision_with_phones_durs(opts:SupervisionOptions, phones_durs:list<tuple<int, int>>) -> (success:bool, proto_supervision:ProtoSupervision)¶: Creates a proto supervision from a list of (phone, duration) pairs.

kaldi.chain.compute_chain_objf_and_deriv(opts:ChainTrainingOptions, den_graph:DenominatorGraph, supervision:Supervision, nnet_output:CuMatrixBase, nnet_output_deriv:CuMatrixBase, xent_output_deriv:CuMatrix) -> (objf:float, l2_term:float, weight:float)¶

Does both the numerator and denominator parts of the chain computation in one call.

Parameters:	opts (ChainTrainingOptions) – Struct containing options den_graph (DenominatorGraph) – The denominator graph, derived from denominator fst. supervision (Supervision) – The supervision object containing the supervision paths and constrains nnet_output (CuMatrixBase) – The output of the neural net; dimension must equal ((supervision.num_sequences * supervision.frames_per_sequence) by den_graph.num_pdfs) nnet_output_deriv (CuMatrixBase) – The derivative of the objective function w.r.t. the neural net output xent_output_deriv (CuMatrix) – If non-NULL, then the numerator part of the derivative (equals the posterior from the numerator forward-backward, scaled by the supervision weight)
Returns:	The [num -den] objective function computed for this example l2_term (float): The l2 regularization term in the objective function weight (float): The weight to normalize the objective function by
Return type:	objf (float)

kaldi.chain.compute_fst_state_times(fst:StdVectorFst) -> (path_length:int, state_times:list<int>)¶

Computes the times for FST states.

Assuming the ‘fst’ is epsilon-free, connected, and has the property that all paths from the start-state are of the same length, output a vector containing that length (from the start-state to the current state) to ‘state_times’. The member ‘fst’ of struct Supervision has this property.

Similar to lattice_state_times and compact_lattice_state_times, except that it does not allow epsilons– not because they are hard to handle but because in this context we don’t expect them. This function also expects that the input fst will have the property that the state times are in nondecreasing order (as sort_breadth_first_search will accomplish for FSTs satsifying the other properties we mentioned). This just happens to be something we enforce while creating these FSTs.

Parameters:	fst (StdVectorFst) – Input fst; epsilon-free; connected; nonempty; should have the property that all paths to a given state should have the same number of arcs; and states should be sorted on this path length
Returns:	The path length and the state times.

kaldi.chain.convert_supervision_to_unconstrained(trans_model:TransitionModel, supervision:Supervision) → bool¶

Converts supervision to an unconstrained supervision.

This function converts a ‘Supervision’ object that has a non-cyclic FST as its ‘fst’ member, and converts it to one that has a cyclic FST in its e2e_fsts[0], and has ‘alignment_pdfs’ set to a random path through the original ‘fst’ (this used only in the binary nnet3-chain-acc-lda-stats). This can be used to train without any constraints on the alignment of phones internal to chunks, while still imposing constraints at chunk boundaries.

kaldi.chain.create_denominator_fst(ctx_dep:ContextDependency, trans_model:TransitionModel, phone_lm:StdVectorFst) → StdVectorFst¶

Creates denominator graph.

Starting from an acceptor on phones that represents some kind of compiled language model (with no disambiguation symbols), this funtion creates the denominator-graph.

kaldi.chain.get_weights_for_ranges(range_length:int, range_starts:list<int>) → list<Vector>¶

This function gets the weights for the derivatives.

Parameters:	range_starts (list of python:int) – obtained from `split_into_ranges()` range_length (int) – length in frames (maybe longer that the one supplied in split_into_ranges)
Returns:	Output vector weights with the same dimension as range_starts.

kaldi.chain.map_fst_to_pdf_ids_plus_one(trans_model:TransitionModel, fst:StdVectorFst)¶: Converts transition-ids in input FST to pdf-ids plus one.

kaldi.chain.minimize_acceptor_no_push(fst:StdVectorFst)¶

Minimizes acceptor without without weight pushing.

This is useful for constructing denominator graph.

kaldi.chain.phone_lattice_to_proto_supervision(opts:SupervisionOptions, clat:CompactLatticeVectorFst) -> (success:bool, proto_supervision:ProtoSupervision)¶: Creates a proto supervision from a phone-aligned phone lattice.

kaldi.chain.proto_supervision_to_supervision(ctx_dep:ContextDependencyInterface, trans_model:TransitionModel, proto_supervision:ProtoSupervision, convert_to_pdfs:bool) -> (success:bool, supervision:Supervision)¶: Creates a Supervision object from a ProtoSupervision object.

kaldi.chain.sort_breadth_first_search(fst:StdVectorFst)¶: Sorts the states of the Fst in a breadth-first search order.

kaldi.chain.split_into_ranges(num_frames:int, frames_per_range:int) → list<int>¶: Pseudo-randomly split a sequence of length num_frames into pieces of length exactly frames_per_range

kaldi.chain.training_graph_to_supervision_e2e(training_graph:StdVectorFst, trans_model:TransitionModel, num_frames:int) -> (success:bool, supervision:Supervision)¶: Creates and initializes an end-to-end supervision object from training FST.