kaldi.hmm¶
HMM Topology¶
The following would be the text form for the “normal” HMM topology. Note that the first state is the start state, and the final state, which must have no output transitions and must be nonemitting, has an exit probability of one (no other state can have nonzero exit probability; you can treat the transition probability to the final state as an exit probability).
Note also that it’s valid to omit the “<PdfClass>” entry of the <State>, which will mean we won’t have a pdf on that state [non-emitting state]. This is equivalent to setting the <PdfClass> to -1. We do this normally just for the final state.
The Topology object can have multiple <TopologyEntry> blocks. This is useful if there are multiple types of topology in the system.
<Topology>
<TopologyEntry>
<ForPhones> 1 2 3 4 5 6 7 8 </ForPhones>
<State> 0 <PdfClass> 0
<Transition> 0 0.5
<Transition> 1 0.5
</State>
<State> 1 <PdfClass> 1
<Transition> 1 0.5
<Transition> 2 0.5
</State>
<State> 2 <PdfClass> 2
<Transition> 2 0.5
<Transition> 3 0.5
<Final> 0.5
</State>
<State> 3
</State>
</TopologyEntry>
</Topology>
NO_PDF
is used where pdf_class or pdf would be used, to indicate, none is
there. Mainly useful in skippable models, but also used for end states.
A caveat with non-emitting states is that their out-transitions are not
trainable, due to technical issues with the way we decided to accumulate the
stats. Any transitions arising from (*) HMM states with NO_PDF
as the label
are second-class transitions, They do not have “transition-states” or
“transition-ids” associated with them. They are used to create the FST version
of the HMMs, where they lead to epsilon arcs.
(*) “arising from” is a bit of a technical term here, due to the way (if reorder == true), we put the transition-id associated with the outward arcs of the state, on the input transition to the state.
Transition Model¶
The class TransitionModel
is a repository for the transition probabilities.
It also handles certain integer mappings.
The basic model is as follows. Each phone has a HMM topology. Each HMM-state of
each of these phones has a number of transitions (and final-probs) out of it.
Each HMM-state defined in the HmmTopology
class has an associated “pdf_class”.
This gets replaced with an actual pdf-id via the tree. The transition model
associates the transition probs with the (phone, HMM-state, pdf-id). We
associate with each such triple a transition-state. Each transition-state has a
number of associated probabilities to estimate; this depends on the number of
transitions/final-probs in the topology for that (phone, HMM-state). Each
probability has an associated transition-index. We associate with each
(transition-state, transition-index) a unique transition-id. Each individual
probability estimated by the transition-model is asociated with a transition-id.
List of the various types of quantity referred to here and what they mean:
- phone
- a phone index (1, 2, 3 …)
- HMM-state
- a number (0, 1, 2…) that indexes TopologyEntry (see hmm-topology.h)
- pdf-id
- a number output by the compute method of
ContextDependency
(it indexes pdf’s, either forward or self-loop). Zero-based. - transition-state
- the states for which we estimate transition probabilities for transitions out of them. In some topologies, will map one-to-one with pdf-ids. One-based, since it appears on FSTs.
- transition-index
- identifier of a transition (or final-prob) in the HMM. Indexes the
“transitions” vector in
HmmTopology.HmmState
. [if it is out of range, equal to length of transitions, it refers to the final-prob.] Zero-based. - transition-id
- identifier of a unique parameter of the
TransitionModel
. Associated with a (transition-state, transition-index) pair. One-based, since it appears on FSTs.
List of the possible mappings TransitionModel can do:
Forward mappings:
(phone, HMM-state, forward-pdf-id, self-loop-pdf-id) -> transition-state
(transition-state, transition-index) -> transition-id
Reverse mappings:
transition-id -> transition-state
transition-id -> transition-index
transition-state -> phone
transition-state -> HMM-state
transition-state -> forward-pdf-id
transition-state -> self-loop-pdf-id
The main things the TransitionModel object can do are:
- Get initialized (need ContextDependency and HmmTopology objects).
- Read/write.
- Update [given a vector of counts indexed by transition-id].
- Do the various integer mappings mentioned above.
- Get the probability (or log-probability) associated with a particular transition-id.
-
kaldi.hmm.
NO_PDF
= -1¶
Functions
accumulate_tree_stats |
Accumulates the stats needed for training context-dependency trees. |
add_self_loops |
Expands an FST that has been built without self-loops. |
add_transition_probs |
Adds transition probabilities with the supplied scales to the graph. |
add_transition_probs_lat |
Adds transition probabilities with the supplied scales to the lattice. |
convert_alignment |
Converts an alignment that was created using one model to another. |
convert_alignment_with_phone_map |
Converts an alignment that was created using one model to another. |
convert_phnx_to_prons |
Converts a phone sequence and a word sequence to a list of pronunciations |
get_h_transducer |
Creates the H transducer. |
get_ilabel_mapping |
Produces a mapping from logical to physical HMMs. |
get_pdfs_for_phones |
Works out which pdfs might correspond to the given phones. |
get_phones_for_pdfs |
Works out which phones might correspond to the given pdfs. |
merge_posteriors |
Merges two Posterior objects. |
posterior_entries_are_disjoint |
Returns True if the lists have no common first element (transition-id). |
read_phone_map |
Reads a mapping from one phone set to another. |
split_to_phones |
Splits transition-ids in alignment into phones (one list per phone). |
vector_to_posterior_entries |
Converts log-likelihoods to a list of posterior entries. |
Classes
AccumulateTreeStatsInfo |
Alternative options representation for accumulating tree statistics. |
AccumulateTreeStatsOptions |
Options for accumulating tree statistics. |
HTransducerConfig |
Configuration options for the H transducer. |
HmmTopology |
HMM topology information for phones. |
MapTransitionUpdateConfig |
Options for MAP estimation of transition probabilities. |
MleTransitionUpdateConfig |
Options for MLE estimation of transition probabilities. |
Posterior |
Wrapper for frame posteriors. |
TransitionModel |
Transition model. |
-
class
kaldi.hmm.
AccumulateTreeStatsInfo
¶ Alternative options representation for accumulating tree statistics.
Parameters: opts (AccumulateTreeStatsOptions) – Options for accumulating tree statistics. -
central_position
¶ Central position of context-window, zero-based (default=1).
-
ci_phones
¶ List of integer indices for context-independent phones.
-
context_width
¶ Context window size (default=3).
-
phone_map
¶ List of old->now phone mappings.
-
var_floor
¶ Variance floor for tree clustering (default=0.01).
-
-
class
kaldi.hmm.
AccumulateTreeStatsOptions
¶ Options for accumulating tree statistics.
-
central_position
¶ Central position of context-window, zero-based (default=1).
-
ci_phones_str
¶ Colon-separated list of integer indices for context-independent phones.
-
context_width
¶ Context window size (default=3).
-
phone_map_rxfilename
¶ Extended filename for the old->now phone mappings.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
var_floor
¶ Variance floor for tree clustering (default=0.01).
-
-
class
kaldi.hmm.
HTransducerConfig
¶ Configuration options for the H transducer.
-
nonterm_phones_offset
¶ The integer index of the first non-terminal symbol.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
transition_scale
¶ Scale of transition probabilities (relative to language model)
-
-
class
kaldi.hmm.
HmmTopology
¶ HMM topology information for phones.
-
class
HmmState
¶ HMM state.
-
forward_pdf_class
¶ Forward PDF class, typically 0, 1 or 2; -1 means non-emitting.
-
from_forward_and_self_pdf
(forward_pdf_class:int, self_loop_pdf_class:int) → HmmState¶ Constructs a new HmmState object given PDF classes.
-
from_pdf
(pdf_class:int) → HmmState¶ Constructs a new HmmState object from given PDF class.
Both forward and self-loop PDFs are assigned to same class (usual case).
-
self_loop_pdf_class
¶ Self-loop PDF class.
-
transitions
¶ List of transitions in the form (next HMM-state index, initial-transition-prob)
-
-
check
()¶ Checks if HmmTopology object is valid.
- Throws:
- RuntimeError: If object is invalid.
-
get_phone_to_num_pdf_classes
() → list<int>¶ Returns the number of PDF classes for each phone.
-
get_phones
() → list<int>¶ Returns a sorted, unique list of phones covered by the topology.
-
is_hmm
() → bool¶ Checks if HmmTopology is ‘hmm-like’.
A topology is ‘hmm-like’ if the pdf-classes on the self-loop and forward transitions of any state are identical. [note: in HMMs, the densities are associated with the states.] Topologies that are not ‘hmm-like’, where those pdf-classes are different, are also supported. For instance, ‘chain models’ (AKA lattice-free MMI) use 1-state topologies that have different pdf-classes for the self-loop and the forward transition for more compact decoding graphs. Note that we always use the ‘reorder=true’ option so the forward transition actually comes before the self-loop.
-
min_length
(phone:int) → int¶ Returns the minimum number of frames it takes to traverse the HMM for given phone.
-
num_pdf_classes
(phone:int) → int¶ Returns the number of PDF classes for given phone.
- Throws:
- RuntimeError: If the phone is not covered by the topology.
-
read
(is:istream, binary:bool)¶ Reads HmmTopology object from input stream.
-
topology_for_phone
(phone:int) → list<HmmState>¶ Returns the topology entry (i.e. list of HMM states) for given phone.
- Throws:
- RuntimeError: If the phone is not covered by the topology.
-
write
(os:ostream, binary:bool)¶ Writes HmmTopology object to output stream.
-
class
-
class
kaldi.hmm.
MapTransitionUpdateConfig
¶ Options for MAP estimation of transition probabilities.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
Share all transition parameters where the states have the same PDF
-
tau
¶ Tau value for MAP estimation of transition probabilities.
-
-
class
kaldi.hmm.
MleTransitionUpdateConfig
¶ Options for MLE estimation of transition probabilities.
Parameters: -
floor
¶ Floor for transition probabilities
-
mincount
¶ Minimum count required to update transitions from a state
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
Share all transition parameters where the states have the same PDF
-
-
class
kaldi.hmm.
Posterior
¶ Wrapper for frame posteriors.
We wrap frame posteriors (of type list<list<tuple<int, float>>>) with this class to avoid copying frame posteriors every time a function that accept frame posteriors is called and also to provide a more Pythonic interface.
-
convert_transitions_to_pdfs
(tmodel:TransitionModel) → Posterior¶ Converts posteriors over transition-ids to posteriors over pdf-ids.
-
convert_transitions_to_phones
(tmodel:TransitionModel) → Posterior¶ Converts posteriors over transition-ids to posteriors over phones.
-
from_alignment
(ali:list<int>) → Posterior¶ Creates a new Posterior object from an alignment.
-
from_posteriors
(post:list<list<tuple<int, float>>>) → Posterior¶ Creates a new Posterior object from frame posteriors.
-
get_posteriors
() → list<list<tuple<int, float>>>¶ Returns the frame posteriors.
-
read
(is:istream, binary:bool)¶ Reads Posterior object from input stream.
-
scale
(scale:float)¶ Scales frame posteriors.
-
sort_by_pdfs
(tmodel:TransitionModel)¶ Sorts posterior entries by pdf-ids.
At the end of this operation posterior entries for transition-ids with the same pdf-ids are next to each other.
-
to_matrix
(post_dim:int) → Matrix¶ Converts frame posteriors to a matrix.
The number of matrix-rows is the same as the length of the posterior and the number of matrix-columns is defined by ‘post_dim’. The elements which are not specified in the posterior are equal to zero.
-
to_pdf_matrix
(model:TransitionModel) → Matrix¶ Converts frame posteriors to a matrix.
The number of matrix-rows is the same as the length of the posterior and the number of matrix-columns is defined by the number of PDFs in the transition model. The elements which are not specified in the posterior are equal to zero.
-
total
() → float¶ Returns the total of all the weights in the frame posteriors.
-
weight_silence
(tmodel:TransitionModel, silence_set:ConstIntegerSet, silence_scale:float)¶ Weights silence phones.
Any silence phone in the frame posteriors (i.e. any phone in the set “silence_set”) is weighted by “silence_scale”.
-
weight_silence_distributed
(tmodel:TransitionModel, silence_set:ConstIntegerSet, silence_scale:float)¶ Weights silence phones.
This is similar to
weight_silence()
, except that on each frame it works out the amount by which the overall posterior would be reduced, and scales down everything on that frame by the same amount. It has the effect that frames that are mostly silence get down-weighted.
-
write
(os:ostream, binary:bool)¶ Writes Posterior object to output stream.
-
-
class
kaldi.hmm.
TransitionModel
¶ Transition model.
-
accumulate
(prob:float, trans_id:int, stats:DoubleVector)¶ Accumulates statistics.
-
compatible
(other:TransitionModel) → bool¶ Checks if this transition model is compatible with another.
-
from_topo
(ctx_dep:ContextDependencyInterface, hmm_topo:HmmTopology) → TransitionModel¶ Creates a new TransitionModel object.
Parameters: - ctx_dep (ContextDependencyInterface) – Context dependency decision tree.
- hmm_topo (HmmTopology) – HMM topology.
-
get_non_self_loop_log_prob
(trans_id:int) → float¶ Returns the log of non-self-loop probability mass for given transition.
-
get_phones
() → list<int>¶ Returns a sorted, unique list of phones.
-
get_topo
() → HmmTopology¶ Returns the HMM topology.
-
get_transition_log_prob
(trans_id:int) → float¶ Returns the log probability associated with given transition-id.
-
get_transition_log_prob_ignoring_self_loops
(trans_id:int) → float¶ Returns the log probability associated with given transition-id if self-loop is ignored.
Returns the log-probability of a particular non-self-loop transition after subtracting the probability mass of the self-loop and renormalizing. Specifically: for non-self-loops it returns log(prob-for-transition / (1 - prob-for-sel-floop)).
Raises: RuntimeError
– if called on a self-loop.
-
get_transition_prob
(trans_id:int) → float¶ Returns the probability associated with given transition-id.
-
init_stats
(stats:DoubleVector)¶ Initializes statistics.
-
is_final
(trans_id:int) → bool¶ Returns True if this transition-id foes to the final state.
-
is_self_loop
(trans_id:int) → bool¶ Returns True if this transition-id corresponds to a self-loop.
-
map_update
(stats:DoubleVector, cfg:MapTransitionUpdateConfig) -> (objf_impr_out:float, count_out:float)¶ Does Maximum A Posteriori estimation.
The stats are counts/weights, indexed by transition-id.
-
mle_update
(stats:DoubleVector, cfg:MleTransitionUpdateConfig) -> (objf_impr_out:float, count_out:float)¶ Does Maximum Likelihood estimation.
The stats are counts/weights, indexed by transition-id.
-
num_pdfs
() → int¶ Returns the highest numbered PDF we ever saw plus one.
-
num_phones
() → int¶ Returns the highest phone index present.
-
num_transition_ids
() → int¶ Returns the total number of transition-ids.
-
num_transition_indices
(trans_state:int) → int¶ Returns the number of transition-indices for given transition-state.
-
num_transition_states
() → int¶ Returns the total number of transition-states.
-
pair_to_transition_id
(trans_state:int, trans_index:int) → int¶ Maps (trans-state, trans-index) pair to transition-id.
-
print_model
(os:ostream, phone_names:list<str>, occs:DoubleVector=default)¶ Prints a human-readable representation of transition model.
-
read
(is:istream, binary:bool)¶ Reads TransitionModel object from input stream.
-
self_loop_of
(trans_state:int) → int¶ Returns the self-loop transition-id, or zero id this state does not have a self-loop.
-
transition_id_to_hmm_state
(trans_id:int) → int¶ Maps transition-id to hmm-state.
-
transition_id_to_pdf
(trans_id:int) → int¶ Maps transition-id to pdf-id.
-
transition_id_to_pdf_class
(trans_id:int) → int¶ Maps transition-id to pdf-class.
-
transition_id_to_pdf_fast
(trans_id:int) → int¶ Maps transition-id to pdf-id (faster, skips an assertion).
-
transition_id_to_phone
(trans_id:int) → int¶ Maps transition-id to phone.
-
transition_id_to_transition_index
(trans_id:int) → int¶ Maps transition-id to transition-index.
-
transition_id_to_transition_state
(trans_id:int) → int¶ Maps transition-id to transition-state.
-
transition_state_to_forward_pdf
(trans_state:int) → int¶ Maps transition-state to forward-pdf-id.
-
transition_state_to_forward_pdf_class
(trans_state:int) → int¶ Maps transition-state to forward-pdf-class.
-
transition_state_to_hmm_state
(trans_state:int) → int¶ Maps transition-state to hmm-state.
-
transition_state_to_phone
(trans_state:int) → int¶ Maps transition-state to phone.
-
transition_state_to_self_loop_pdf
(trans_state:int) → int¶ Maps transition-state to self-loop-pdf-id.
-
transition_state_to_self_loop_pdf_class
(trans_state:int) → int¶ Maps transition-state to self-loop-pdf-class.
-
tuple_to_transition_state
(phone:int, hmm_state:int, pdf:int, self_loop_pdf:int) → int¶ Maps (phone, hmm-state, forward-pdf-id, self-loop-pdf-id) tuple to transition-state.
-
write
(os:ostream, binary:bool)¶ Writes TransitionModel object to output stream.
-
-
kaldi.hmm.
accumulate_tree_stats
(trans_model:TransitionModel, info:AccumulateTreeStatsInfo, alignment:list<int>, features:Matrix) → dict<list<tuple<int, int>>, GaussClusterable>¶ Accumulates the stats needed for training context-dependency trees.
-
kaldi.hmm.
add_self_loops
(trans_model:TransitionModel, disambig_syms:list<int>, self_loop_scale:float, reorder:bool, check_no_self_loops:bool, fst:StdVectorFst)¶ Expands an FST that has been built without self-loops.
-
kaldi.hmm.
add_transition_probs
(trans_model:TransitionModel, disambig_syms:list<int>, transition_scale:float, self_loop_scale:float, fst:StdVectorFst)¶ Adds transition probabilities with the supplied scales to the graph.
-
kaldi.hmm.
add_transition_probs_lat
(trans_model:TransitionModel, transition_scale:float, self_loop_scale:float, fst:LatticeVectorFst)¶ Adds transition probabilities with the supplied scales to the lattice.
-
kaldi.hmm.
convert_alignment
(old_trans_model:TransitionModel, new_trans_model:TransitionModel, new_ctx_dep:ContextDependencyInterface, old_alignment:list<int>, subsample_factor:int, repeat_frames:bool, reorder:bool) -> (success:bool, new_alignment:list<int>)¶ Converts an alignment that was created using one model to another.
-
kaldi.hmm.
convert_alignment_with_phone_map
(old_trans_model:TransitionModel, new_trans_model:TransitionModel, new_ctx_dep:ContextDependencyInterface, old_alignment:list<int>, subsample_factor:int, repeat_frames:bool, reorder:bool, phone_map:list<int>) -> (success:bool, new_alignment:list<int>)¶ Converts an alignment that was created using one model to another.
-
kaldi.hmm.
convert_phnx_to_prons
(phnx:list<int>, words:list<int>, word_start_sym:int, word_end_sym:int) -> (success:bool, prons:list<list<int>>)¶ Converts a phone sequence and a word sequence to a list of pronunciations
-
kaldi.hmm.
get_h_transducer
(ilabel_info:list<list<int>>, ctx_dep:ContextDependencyInterface, trans_model:TransitionModel, config:HTransducerConfig) -> (h_transducer:StdVectorFst, disambig_syms_left:list<int>)¶ Creates the H transducer.
-
kaldi.hmm.
get_ilabel_mapping
(ilabel_info_old:list<list<int>>, ctx_dep:ContextDependencyInterface, trans_model:TransitionModel) → list<int>¶ Produces a mapping from logical to physical HMMs.
-
kaldi.hmm.
get_pdfs_for_phones
(trans_model:TransitionModel, phones:list<int>) -> (ret:bool, pdfs:list<int>)¶ Works out which pdfs might correspond to the given phones.
Parameters: - trans_model (TransitionModel) – Transition-model used to work out this information
- phones (List[int]) – A sorted, unique vector that represents a set of phones
Returns: First return value will be True if returned pdf-ids correspond to just the given set of phones, False if they may be shared with other phones. Second return value is a sorted, unique list of pdf-ids that correspond to given set of phones.
Return type:
-
kaldi.hmm.
get_phones_for_pdfs
(trans_model:TransitionModel, pdfs:list<int>) -> (ret:bool, phones:list<int>)¶ Works out which phones might correspond to the given pdfs.
Parameters: - trans_model (TransitionModel) – Transition-model used to work out this information
- pdfs (List[int]) – A sorted, unique vector that represents a set of pdfs
Returns: First return value will be True if returned phones correspond to just the given set of pdfs, False if they may be shared with other pdfs. Second return value is a sorted, unique list of phones that correspond to given set of pdfs.
Return type:
-
kaldi.hmm.
merge_posteriors
(post1:Posterior, post2:Posterior, merge:bool, drop_frames:bool) -> (num_frames:int, post_out:Posterior)¶ Merges two Posterior objects.
Inputs must have the same number of frames. If “merge” is true, it will make a common entry whenever there are duplicated entries, adding up the weights. If “drop_frames” is true, for frames where the two sets of posteriors were originally disjoint, makes no entries for that frame (relates to frame dropping, or drop_frames, see Vesely et al, ICASSP 2013). Also returns the number of frames for which the two posteriors were disjoint (i.e. no common transition-ids or whatever index we are using).
-
kaldi.hmm.
posterior_entries_are_disjoint
(post_entries1:list<tuple<int, float>>, post_entries2:list<tuple<int, float>>) → bool¶ Returns True if the lists have no common first element (transition-id).
-
kaldi.hmm.
read_phone_map
(phone_map_rxfilename:str) → list<int>¶ Reads a mapping from one phone set to another.
The phone map file has lines of the form <old-phone> <new-phone>, where both entries are integers, usually nonzero (but this is not enforced).
The output vector “phone_map” will be indexed by old-phone and will contain the corresponding new-phone, or -1 for any entry that was not defined.
Parameters: phone_map_rxfilename (str) – Extended filename for the phone map.
Returns: Phone mapping.
Return type: List[int]
Raises: RuntimeError
– if the input is invalid, e.g. there are multiple- inconsistent entries for the same old phone.
-
kaldi.hmm.
split_to_phones
(trans_model:TransitionModel, alignment:list<int>) -> (success:bool, split_alignment:list<list<int>>)¶ Splits transition-ids in alignment into phones (one list per phone).
-
kaldi.hmm.
vector_to_posterior_entries
(log_likes:VectorBase, num_gselect:int, min_post:float) -> (log_like:float, post_entries:list<tuple<int, float>>)¶ Converts log-likelihoods to a list of posterior entries.
Given a vector of log-likelihoods (typically of Gaussians in a GMM but could be of pdf-ids), a number gselect >= 1 and a minimum posterior 0 <= min_post < 1, it gets the posterior for each element of log-likes by applying softmax, then prunes the posteriors using “gselect” and “min_post” (keeping at least one), and outputs the result into “post_entries”, sorted from greatest to least posterior.
Returns: The total log-likelihood (the softmax output) and the “post_entries”.