kaldi.hmm

HMM Topology

The following would be the text form for the “normal” HMM topology. Note that the first state is the start state, and the final state, which must have no output transitions and must be nonemitting, has an exit probability of one (no other state can have nonzero exit probability; you can treat the transition probability to the final state as an exit probability).

Note also that it’s valid to omit the “<PdfClass>” entry of the <State>, which will mean we won’t have a pdf on that state [non-emitting state]. This is equivalent to setting the <PdfClass> to -1. We do this normally just for the final state.

The Topology object can have multiple <TopologyEntry> blocks. This is useful if there are multiple types of topology in the system.

<Topology>
<TopologyEntry>
<ForPhones> 1 2 3 4 5 6 7 8 </ForPhones>
<State> 0 <PdfClass> 0
<Transition> 0 0.5
<Transition> 1 0.5
</State>
<State> 1 <PdfClass> 1
<Transition> 1 0.5
<Transition> 2 0.5
</State>
<State> 2 <PdfClass> 2
<Transition> 2 0.5
<Transition> 3 0.5
<Final> 0.5
</State>
<State> 3
</State>
</TopologyEntry>
</Topology>

NO_PDF is used where pdf_class or pdf would be used, to indicate, none is there. Mainly useful in skippable models, but also used for end states.

A caveat with non-emitting states is that their out-transitions are not trainable, due to technical issues with the way we decided to accumulate the stats. Any transitions arising from (*) HMM states with NO_PDF as the label are second-class transitions, They do not have “transition-states” or “transition-ids” associated with them. They are used to create the FST version of the HMMs, where they lead to epsilon arcs.

(*) “arising from” is a bit of a technical term here, due to the way (if reorder == true), we put the transition-id associated with the outward arcs of the state, on the input transition to the state.

Transition Model

The class TransitionModel is a repository for the transition probabilities. It also handles certain integer mappings.

The basic model is as follows. Each phone has a HMM topology. Each HMM-state of each of these phones has a number of transitions (and final-probs) out of it. Each HMM-state defined in the HmmTopology class has an associated “pdf_class”. This gets replaced with an actual pdf-id via the tree. The transition model associates the transition probs with the (phone, HMM-state, pdf-id). We associate with each such triple a transition-state. Each transition-state has a number of associated probabilities to estimate; this depends on the number of transitions/final-probs in the topology for that (phone, HMM-state). Each probability has an associated transition-index. We associate with each (transition-state, transition-index) a unique transition-id. Each individual probability estimated by the transition-model is asociated with a transition-id.

List of the various types of quantity referred to here and what they mean:

phone
a phone index (1, 2, 3 …)
HMM-state
a number (0, 1, 2…) that indexes TopologyEntry (see hmm-topology.h)
pdf-id
a number output by the compute method of ContextDependency (it indexes pdf’s, either forward or self-loop). Zero-based.
transition-state
the states for which we estimate transition probabilities for transitions out of them. In some topologies, will map one-to-one with pdf-ids. One-based, since it appears on FSTs.
transition-index
identifier of a transition (or final-prob) in the HMM. Indexes the “transitions” vector in HmmTopology.HmmState. [if it is out of range, equal to length of transitions, it refers to the final-prob.] Zero-based.
transition-id
identifier of a unique parameter of the TransitionModel. Associated with a (transition-state, transition-index) pair. One-based, since it appears on FSTs.

List of the possible mappings TransitionModel can do:

Forward mappings:

(phone, HMM-state, forward-pdf-id, self-loop-pdf-id) -> transition-state
                (transition-state, transition-index) -> transition-id

Reverse mappings:
                                       transition-id -> transition-state
                                       transition-id -> transition-index
                                    transition-state -> phone
                                    transition-state -> HMM-state
                                    transition-state -> forward-pdf-id
                                    transition-state -> self-loop-pdf-id

The main things the TransitionModel object can do are:

  • Get initialized (need ContextDependency and HmmTopology objects).
  • Read/write.
  • Update [given a vector of counts indexed by transition-id].
  • Do the various integer mappings mentioned above.
  • Get the probability (or log-probability) associated with a particular transition-id.

kaldi.hmm.NO_PDF = -1

Functions

accumulate_tree_stats Accumulates the stats needed for training context-dependency trees.
add_self_loops Expands an FST that has been built without self-loops.
add_transition_probs Adds transition probabilities with the supplied scales to the graph.
add_transition_probs_lat Adds transition probabilities with the supplied scales to the lattice.
convert_alignment Converts an alignment that was created using one model to another.
convert_alignment_with_phone_map Converts an alignment that was created using one model to another.
convert_phnx_to_prons Converts a phone sequence and a word sequence to a list of pronunciations
get_h_transducer Creates the H transducer.
get_ilabel_mapping Produces a mapping from logical to physical HMMs.
get_pdfs_for_phones Works out which pdfs might correspond to the given phones.
get_phones_for_pdfs Works out which phones might correspond to the given pdfs.
merge_posteriors Merges two Posterior objects.
posterior_entries_are_disjoint Returns True if the lists have no common first element (transition-id).
read_phone_map Reads a mapping from one phone set to another.
split_to_phones Splits transition-ids in alignment into phones (one list per phone).
vector_to_posterior_entries Converts log-likelihoods to a list of posterior entries.

Classes

AccumulateTreeStatsInfo Alternative options representation for accumulating tree statistics.
AccumulateTreeStatsOptions Options for accumulating tree statistics.
HTransducerConfig Configuration options for the H transducer.
HmmTopology HMM topology information for phones.
MapTransitionUpdateConfig Options for MAP estimation of transition probabilities.
MleTransitionUpdateConfig Options for MLE estimation of transition probabilities.
Posterior Wrapper for frame posteriors.
TransitionModel Transition model.
class kaldi.hmm.AccumulateTreeStatsInfo

Alternative options representation for accumulating tree statistics.

Parameters:opts (AccumulateTreeStatsOptions) – Options for accumulating tree statistics.
central_position

Central position of context-window, zero-based (default=1).

ci_phones

List of integer indices for context-independent phones.

context_width

Context window size (default=3).

phone_map

List of old->now phone mappings.

var_floor

Variance floor for tree clustering (default=0.01).

class kaldi.hmm.AccumulateTreeStatsOptions

Options for accumulating tree statistics.

central_position

Central position of context-window, zero-based (default=1).

ci_phones_str

Colon-separated list of integer indices for context-independent phones.

context_width

Context window size (default=3).

phone_map_rxfilename

Extended filename for the old->now phone mappings.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
var_floor

Variance floor for tree clustering (default=0.01).

class kaldi.hmm.HTransducerConfig

Configuration options for the H transducer.

nonterm_phones_offset

The integer index of the first non-terminal symbol.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
transition_scale

Scale of transition probabilities (relative to language model)

class kaldi.hmm.HmmTopology

HMM topology information for phones.

class HmmState

HMM state.

forward_pdf_class

Forward PDF class, typically 0, 1 or 2; -1 means non-emitting.

from_forward_and_self_pdf(forward_pdf_class:int, self_loop_pdf_class:int) → HmmState

Constructs a new HmmState object given PDF classes.

from_pdf(pdf_class:int) → HmmState

Constructs a new HmmState object from given PDF class.

Both forward and self-loop PDFs are assigned to same class (usual case).

self_loop_pdf_class

Self-loop PDF class.

transitions

List of transitions in the form (next HMM-state index, initial-transition-prob)

check()

Checks if HmmTopology object is valid.

Throws:
RuntimeError: If object is invalid.
get_phone_to_num_pdf_classes() → list<int>

Returns the number of PDF classes for each phone.

get_phones() → list<int>

Returns a sorted, unique list of phones covered by the topology.

is_hmm() → bool

Checks if HmmTopology is ‘hmm-like’.

A topology is ‘hmm-like’ if the pdf-classes on the self-loop and forward transitions of any state are identical. [note: in HMMs, the densities are associated with the states.] Topologies that are not ‘hmm-like’, where those pdf-classes are different, are also supported. For instance, ‘chain models’ (AKA lattice-free MMI) use 1-state topologies that have different pdf-classes for the self-loop and the forward transition for more compact decoding graphs. Note that we always use the ‘reorder=true’ option so the forward transition actually comes before the self-loop.

min_length(phone:int) → int

Returns the minimum number of frames it takes to traverse the HMM for given phone.

num_pdf_classes(phone:int) → int

Returns the number of PDF classes for given phone.

Throws:
RuntimeError: If the phone is not covered by the topology.
read(is:istream, binary:bool)

Reads HmmTopology object from input stream.

topology_for_phone(phone:int) → list<HmmState>

Returns the topology entry (i.e. list of HMM states) for given phone.

Throws:
RuntimeError: If the phone is not covered by the topology.
write(os:ostream, binary:bool)

Writes HmmTopology object to output stream.

class kaldi.hmm.MapTransitionUpdateConfig

Options for MAP estimation of transition probabilities.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
share_for_pdfs

Share all transition parameters where the states have the same PDF

tau

Tau value for MAP estimation of transition probabilities.

class kaldi.hmm.MleTransitionUpdateConfig

Options for MLE estimation of transition probabilities.

Parameters:
  • floor (float) – Floor for transition probabilities (default=0.01).
  • mincount (float) – Minimum count required to update transitions from a state (default=5.0).
  • share_for_pdfs (bool) – Share all transition parameters where the states have the same PDF (default=False).
floor

Floor for transition probabilities

mincount

Minimum count required to update transitions from a state

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
share_for_pdfs

Share all transition parameters where the states have the same PDF

class kaldi.hmm.Posterior

Wrapper for frame posteriors.

We wrap frame posteriors (of type list<list<tuple<int, float>>>) with this class to avoid copying frame posteriors every time a function that accept frame posteriors is called and also to provide a more Pythonic interface.

convert_transitions_to_pdfs(tmodel:TransitionModel) → Posterior

Converts posteriors over transition-ids to posteriors over pdf-ids.

convert_transitions_to_phones(tmodel:TransitionModel) → Posterior

Converts posteriors over transition-ids to posteriors over phones.

from_alignment(ali:list<int>) → Posterior

Creates a new Posterior object from an alignment.

from_posteriors(post:list<list<tuple<int, float>>>) → Posterior

Creates a new Posterior object from frame posteriors.

get_posteriors() → list<list<tuple<int, float>>>

Returns the frame posteriors.

read(is:istream, binary:bool)

Reads Posterior object from input stream.

scale(scale:float)

Scales frame posteriors.

sort_by_pdfs(tmodel:TransitionModel)

Sorts posterior entries by pdf-ids.

At the end of this operation posterior entries for transition-ids with the same pdf-ids are next to each other.

to_matrix(post_dim:int) → Matrix

Converts frame posteriors to a matrix.

The number of matrix-rows is the same as the length of the posterior and the number of matrix-columns is defined by ‘post_dim’. The elements which are not specified in the posterior are equal to zero.

to_pdf_matrix(model:TransitionModel) → Matrix

Converts frame posteriors to a matrix.

The number of matrix-rows is the same as the length of the posterior and the number of matrix-columns is defined by the number of PDFs in the transition model. The elements which are not specified in the posterior are equal to zero.

total() → float

Returns the total of all the weights in the frame posteriors.

weight_silence(tmodel:TransitionModel, silence_set:ConstIntegerSet, silence_scale:float)

Weights silence phones.

Any silence phone in the frame posteriors (i.e. any phone in the set “silence_set”) is weighted by “silence_scale”.

weight_silence_distributed(tmodel:TransitionModel, silence_set:ConstIntegerSet, silence_scale:float)

Weights silence phones.

This is similar to weight_silence(), except that on each frame it works out the amount by which the overall posterior would be reduced, and scales down everything on that frame by the same amount. It has the effect that frames that are mostly silence get down-weighted.

write(os:ostream, binary:bool)

Writes Posterior object to output stream.

class kaldi.hmm.TransitionModel

Transition model.

accumulate(prob:float, trans_id:int, stats:DoubleVector)

Accumulates statistics.

compatible(other:TransitionModel) → bool

Checks if this transition model is compatible with another.

from_topo(ctx_dep:ContextDependencyInterface, hmm_topo:HmmTopology) → TransitionModel

Creates a new TransitionModel object.

Parameters:
  • ctx_dep (ContextDependencyInterface) – Context dependency decision tree.
  • hmm_topo (HmmTopology) – HMM topology.
get_non_self_loop_log_prob(trans_id:int) → float

Returns the log of non-self-loop probability mass for given transition.

get_phones() → list<int>

Returns a sorted, unique list of phones.

get_topo() → HmmTopology

Returns the HMM topology.

get_transition_log_prob(trans_id:int) → float

Returns the log probability associated with given transition-id.

get_transition_log_prob_ignoring_self_loops(trans_id:int) → float

Returns the log probability associated with given transition-id if self-loop is ignored.

Returns the log-probability of a particular non-self-loop transition after subtracting the probability mass of the self-loop and renormalizing. Specifically: for non-self-loops it returns log(prob-for-transition / (1 - prob-for-sel-floop)).

Raises:RuntimeError – if called on a self-loop.
get_transition_prob(trans_id:int) → float

Returns the probability associated with given transition-id.

init_stats(stats:DoubleVector)

Initializes statistics.

is_final(trans_id:int) → bool

Returns True if this transition-id foes to the final state.

is_self_loop(trans_id:int) → bool

Returns True if this transition-id corresponds to a self-loop.

map_update(stats:DoubleVector, cfg:MapTransitionUpdateConfig) -> (objf_impr_out:float, count_out:float)

Does Maximum A Posteriori estimation.

The stats are counts/weights, indexed by transition-id.

mle_update(stats:DoubleVector, cfg:MleTransitionUpdateConfig) -> (objf_impr_out:float, count_out:float)

Does Maximum Likelihood estimation.

The stats are counts/weights, indexed by transition-id.

num_pdfs() → int

Returns the highest numbered PDF we ever saw plus one.

num_phones() → int

Returns the highest phone index present.

num_transition_ids() → int

Returns the total number of transition-ids.

num_transition_indices(trans_state:int) → int

Returns the number of transition-indices for given transition-state.

num_transition_states() → int

Returns the total number of transition-states.

pair_to_transition_id(trans_state:int, trans_index:int) → int

Maps (trans-state, trans-index) pair to transition-id.

print_model(os:ostream, phone_names:list<str>, occs:DoubleVector=default)

Prints a human-readable representation of transition model.

read(is:istream, binary:bool)

Reads TransitionModel object from input stream.

self_loop_of(trans_state:int) → int

Returns the self-loop transition-id, or zero id this state does not have a self-loop.

transition_id_to_hmm_state(trans_id:int) → int

Maps transition-id to hmm-state.

transition_id_to_pdf(trans_id:int) → int

Maps transition-id to pdf-id.

transition_id_to_pdf_class(trans_id:int) → int

Maps transition-id to pdf-class.

transition_id_to_pdf_fast(trans_id:int) → int

Maps transition-id to pdf-id (faster, skips an assertion).

transition_id_to_phone(trans_id:int) → int

Maps transition-id to phone.

transition_id_to_transition_index(trans_id:int) → int

Maps transition-id to transition-index.

transition_id_to_transition_state(trans_id:int) → int

Maps transition-id to transition-state.

transition_state_to_forward_pdf(trans_state:int) → int

Maps transition-state to forward-pdf-id.

transition_state_to_forward_pdf_class(trans_state:int) → int

Maps transition-state to forward-pdf-class.

transition_state_to_hmm_state(trans_state:int) → int

Maps transition-state to hmm-state.

transition_state_to_phone(trans_state:int) → int

Maps transition-state to phone.

transition_state_to_self_loop_pdf(trans_state:int) → int

Maps transition-state to self-loop-pdf-id.

transition_state_to_self_loop_pdf_class(trans_state:int) → int

Maps transition-state to self-loop-pdf-class.

tuple_to_transition_state(phone:int, hmm_state:int, pdf:int, self_loop_pdf:int) → int

Maps (phone, hmm-state, forward-pdf-id, self-loop-pdf-id) tuple to transition-state.

write(os:ostream, binary:bool)

Writes TransitionModel object to output stream.

kaldi.hmm.accumulate_tree_stats(trans_model:TransitionModel, info:AccumulateTreeStatsInfo, alignment:list<int>, features:Matrix) → dict<list<tuple<int, int>>, GaussClusterable>

Accumulates the stats needed for training context-dependency trees.

kaldi.hmm.add_self_loops(trans_model:TransitionModel, disambig_syms:list<int>, self_loop_scale:float, reorder:bool, check_no_self_loops:bool, fst:StdVectorFst)

Expands an FST that has been built without self-loops.

kaldi.hmm.add_transition_probs(trans_model:TransitionModel, disambig_syms:list<int>, transition_scale:float, self_loop_scale:float, fst:StdVectorFst)

Adds transition probabilities with the supplied scales to the graph.

kaldi.hmm.add_transition_probs_lat(trans_model:TransitionModel, transition_scale:float, self_loop_scale:float, fst:LatticeVectorFst)

Adds transition probabilities with the supplied scales to the lattice.

kaldi.hmm.convert_alignment(old_trans_model:TransitionModel, new_trans_model:TransitionModel, new_ctx_dep:ContextDependencyInterface, old_alignment:list<int>, subsample_factor:int, repeat_frames:bool, reorder:bool) -> (success:bool, new_alignment:list<int>)

Converts an alignment that was created using one model to another.

kaldi.hmm.convert_alignment_with_phone_map(old_trans_model:TransitionModel, new_trans_model:TransitionModel, new_ctx_dep:ContextDependencyInterface, old_alignment:list<int>, subsample_factor:int, repeat_frames:bool, reorder:bool, phone_map:list<int>) -> (success:bool, new_alignment:list<int>)

Converts an alignment that was created using one model to another.

kaldi.hmm.convert_phnx_to_prons(phnx:list<int>, words:list<int>, word_start_sym:int, word_end_sym:int) -> (success:bool, prons:list<list<int>>)

Converts a phone sequence and a word sequence to a list of pronunciations

kaldi.hmm.get_h_transducer(ilabel_info:list<list<int>>, ctx_dep:ContextDependencyInterface, trans_model:TransitionModel, config:HTransducerConfig) -> (h_transducer:StdVectorFst, disambig_syms_left:list<int>)

Creates the H transducer.

kaldi.hmm.get_ilabel_mapping(ilabel_info_old:list<list<int>>, ctx_dep:ContextDependencyInterface, trans_model:TransitionModel) → list<int>

Produces a mapping from logical to physical HMMs.

kaldi.hmm.get_pdfs_for_phones(trans_model:TransitionModel, phones:list<int>) -> (ret:bool, pdfs:list<int>)

Works out which pdfs might correspond to the given phones.

Parameters:
  • trans_model (TransitionModel) – Transition-model used to work out this information
  • phones (List[int]) – A sorted, unique vector that represents a set of phones
Returns:

First return value will be True if returned pdf-ids correspond to just the given set of phones, False if they may be shared with other phones. Second return value is a sorted, unique list of pdf-ids that correspond to given set of phones.

Return type:

Tuple[bool, List[int]]

kaldi.hmm.get_phones_for_pdfs(trans_model:TransitionModel, pdfs:list<int>) -> (ret:bool, phones:list<int>)

Works out which phones might correspond to the given pdfs.

Parameters:
  • trans_model (TransitionModel) – Transition-model used to work out this information
  • pdfs (List[int]) – A sorted, unique vector that represents a set of pdfs
Returns:

First return value will be True if returned phones correspond to just the given set of pdfs, False if they may be shared with other pdfs. Second return value is a sorted, unique list of phones that correspond to given set of pdfs.

Return type:

Tuple[bool, List[int]]

kaldi.hmm.merge_posteriors(post1:Posterior, post2:Posterior, merge:bool, drop_frames:bool) -> (num_frames:int, post_out:Posterior)

Merges two Posterior objects.

Inputs must have the same number of frames. If “merge” is true, it will make a common entry whenever there are duplicated entries, adding up the weights. If “drop_frames” is true, for frames where the two sets of posteriors were originally disjoint, makes no entries for that frame (relates to frame dropping, or drop_frames, see Vesely et al, ICASSP 2013). Also returns the number of frames for which the two posteriors were disjoint (i.e. no common transition-ids or whatever index we are using).

kaldi.hmm.posterior_entries_are_disjoint(post_entries1:list<tuple<int, float>>, post_entries2:list<tuple<int, float>>) → bool

Returns True if the lists have no common first element (transition-id).

kaldi.hmm.read_phone_map(phone_map_rxfilename:str) → list<int>

Reads a mapping from one phone set to another.

The phone map file has lines of the form <old-phone> <new-phone>, where both entries are integers, usually nonzero (but this is not enforced).

The output vector “phone_map” will be indexed by old-phone and will contain the corresponding new-phone, or -1 for any entry that was not defined.

Parameters:

phone_map_rxfilename (str) – Extended filename for the phone map.

Returns:

Phone mapping.

Return type:

List[int]

Raises:
  • RuntimeError – if the input is invalid, e.g. there are multiple
  • inconsistent entries for the same old phone.
kaldi.hmm.split_to_phones(trans_model:TransitionModel, alignment:list<int>) -> (success:bool, split_alignment:list<list<int>>)

Splits transition-ids in alignment into phones (one list per phone).

kaldi.hmm.vector_to_posterior_entries(log_likes:VectorBase, num_gselect:int, min_post:float) -> (log_like:float, post_entries:list<tuple<int, float>>)

Converts log-likelihoods to a list of posterior entries.

Given a vector of log-likelihoods (typically of Gaussians in a GMM but could be of pdf-ids), a number gselect >= 1 and a minimum posterior 0 <= min_post < 1, it gets the posterior for each element of log-likes by applying softmax, then prunes the posteriors using “gselect” and “min_post” (keeping at least one), and outputs the result into “post_entries”, sorted from greatest to least posterior.

Returns:The total log-likelihood (the softmax output) and the “post_entries”.