kaldi.lm¶
Functions
build_const_arpa_lm |
Builds a constant ARPA language model from an ARPA language model. |
Classes
ArpaLmCompiler |
ARPA LM compiler. |
ArpaParseOptions |
Options for parsing ARPA files. |
ConstArpaLm |
Constant ARPA language model. |
ConstArpaLmDeterministicFst |
Deterministic on demand constant ARPA language model. |
KaldiRnnlmWrapper |
Mikolov RNNLM wrapper. |
KaldiRnnlmWrapperOpts |
Options for wrapping a Mikolov RNNLM. |
RnnlmDeterministicFst |
Deterministic on-demand Mikolov RNNLM FST. |
-
class
kaldi.lm.
ArpaLmCompiler
¶ ARPA LM compiler.
Constructs the ARPA LM compiler with the given options and the optional symbol table. If symbol table is provided, then the ARPA file that will be read should contain text n-grams, and the words are mapped to labels using the table. bos_symbol and eos_symbol in the options structure must be valid labels in the table, and so must be unk_symbol if provided. The table is not owned by the compiler, but may be augmented, if oov_handling is set to
ArpaFileParser.OovHandling.ADD_TO_SYMBOLS
. If symbol table isNone
, the ARPA file that will be read should contain integer label n-grams, and oov_handling has no effect. bos_symbol and eos_symbol must be valid labels still.Parameters: - options (ArpaParseOptions) – The options for parsing ARPA files.
- sub_eps (int) – The disambigation symbol to substitute with epsilon. If set to 0, bos_symbol and eos_symbol are treated as real symbols. Otherwise they are treated as epsilons.
- symbols (SymbolTable) – The symbol table.
-
fst
() → StdVectorFst¶ Returns a copy of the FST compiled from the ARPA LM file.
-
mutable_fst
() → StdVectorFst¶ Returns the internal FST compiled from the ARPA LM file.
-
options
() → ArpaParseOptions¶ Gets ARPA parser options.
Returns: The ARPA parser options.
-
class
kaldi.lm.
ArpaParseOptions
¶ Options for parsing ARPA files.
-
OovHandling
¶ alias of
ArpaParseOptions.OovHandling
-
bos_symbol
¶ Symbol for <s>, Required non-epsilon.
-
eos_symbol
¶ Symbol for </s>, Required non-epsilon.
-
max_warnings
¶ Maximum warnings to report, <0 unlimited.
-
oov_handling
¶ How to handle OOV words in the file.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
unk_symbol
¶ Symbol for <unk>, Required for
OovHandling.REPLACE_WITH_UNK
.
-
-
class
kaldi.lm.
ConstArpaLm
¶ Constant ARPA language model.
-
get_bos_symbol
() → int¶ Returns the label for <s> symbol.
-
get_eos_symbol
() → int¶ Returns the label for </s> symbol.
-
get_ngram_log_prob
(word:int, hist:list<int>) → float¶ Gets the n-gram log probability for the word given the history.
It first maps possible out-of-vocabulary words to <unk>, if <unk> is defined, and then computes the n-gram log probability.
-
get_ngram_order
() → int¶ Returns the n-gram order.
-
get_unk_symbol
() → int¶ Returns the label for <unk> symbol.
-
history_state_exists
(hist:list<int>) → bool¶ Checks if history states exists in the model.
Returns: True if history word sequence <hist> has a successor in the model, which means <hist> will be a state in the FST format language model.
-
read
(is:istream, binary:bool)¶ Reads the model from an input stream.
-
write
(os:ostream, binary:bool)¶ Writes the model to an output stream.
-
write_arpa
(os:ostream)¶ Writes the model to an output stream as an ARPA file.
-
-
class
kaldi.lm.
ConstArpaLmDeterministicFst
¶ Deterministic on demand constant ARPA language model.
-
final
(state:int) → TropicalWeight¶ Returns the final weight of the given state.
-
get_arc
(s:int, ilabel:int) -> (success:bool, oarc:StdArc)¶ Creates an on demand arc and returns it.
Parameters: Returns: The created arc.
-
start
() → int¶ Returns the start state index.
-
-
class
kaldi.lm.
KaldiRnnlmWrapper
¶ Mikolov RNNLM wrapper.
-
get_eos
() → int¶ Returns the label for EOS symbol.
Returns the size of hidden layer.
-
get_log_prob
(word:int, wseq:list<int>, context_in:list<float>) -> (logprob:float, context_out:list<float>)¶ Computes the log probability of the word given the history.
Computes the log probability of the word given the history and the initial context vector. Returns the log probability and the final context vector.
-
-
class
kaldi.lm.
KaldiRnnlmWrapperOpts
¶ Options for wrapping a Mikolov RNNLM.
-
eos_symbol
¶ EOS symbol, e.g. </s>.
-
register
(opts:OptionsItf)¶ Registers options with an object implementing the options interface.
Parameters: opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
-
unk_symbol
¶ Unknown symbol, e.g. <unk>.
-
-
class
kaldi.lm.
RnnlmDeterministicFst
¶ Deterministic on-demand Mikolov RNNLM FST.
-
final
(state:int) → TropicalWeight¶ Returns the final weight of the given state.
-
get_arc
(s:int, ilabel:int) -> (success:bool, oarc:StdArc)¶ Creates an on demand arc and returns it.
Parameters: Returns: The created arc.
-
start
() → int¶ Returns the start state index.
-
-
kaldi.lm.
build_const_arpa_lm
(options:ArpaParseOptions, arpa_rxfilename:str, const_arpa_wxfilename:str) → bool¶ Builds a constant ARPA language model from an ARPA language model.
Reads in an ARPA format language model, converts it into a constant ARPA language model and writes it out in binary format.