kaldi.lm

Functions

build_const_arpa_lm Builds a constant ARPA language model from an ARPA language model.

Classes

ArpaLmCompiler ARPA LM compiler.
ArpaParseOptions Options for parsing ARPA files.
ConstArpaLm Constant ARPA language model.
ConstArpaLmDeterministicFst Deterministic on demand constant ARPA language model.
KaldiRnnlmWrapper Mikolov RNNLM wrapper.
KaldiRnnlmWrapperOpts Options for wrapping a Mikolov RNNLM.
RnnlmDeterministicFst Deterministic on-demand Mikolov RNNLM FST.
class kaldi.lm.ArpaLmCompiler

ARPA LM compiler.

Constructs the ARPA LM compiler with the given options and the optional symbol table. If symbol table is provided, then the ARPA file that will be read should contain text n-grams, and the words are mapped to labels using the table. bos_symbol and eos_symbol in the options structure must be valid labels in the table, and so must be unk_symbol if provided. The table is not owned by the compiler, but may be augmented, if oov_handling is set to ArpaFileParser.OovHandling.ADD_TO_SYMBOLS. If symbol table is None, the ARPA file that will be read should contain integer label n-grams, and oov_handling has no effect. bos_symbol and eos_symbol must be valid labels still.

Parameters:
  • options (ArpaParseOptions) – The options for parsing ARPA files.
  • sub_eps (int) – The disambigation symbol to substitute with epsilon. If set to 0, bos_symbol and eos_symbol are treated as real symbols. Otherwise they are treated as epsilons.
  • symbols (SymbolTable) – The symbol table.
fst() → StdVectorFst

Returns a copy of the FST compiled from the ARPA LM file.

mutable_fst() → StdVectorFst

Returns the internal FST compiled from the ARPA LM file.

options() → ArpaParseOptions

Gets ARPA parser options.

Returns:The ARPA parser options.
read(is:istream)

Reads ARPA LM file from input stream.

Parameters:is (istream) – The input C++ stream.
class kaldi.lm.ArpaParseOptions

Options for parsing ARPA files.

OovHandling

alias of ArpaParseOptions.OovHandling

bos_symbol

Symbol for <s>, Required non-epsilon.

eos_symbol

Symbol for </s>, Required non-epsilon.

max_warnings

Maximum warnings to report, <0 unlimited.

oov_handling

How to handle OOV words in the file.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
unk_symbol

Symbol for <unk>, Required for OovHandling.REPLACE_WITH_UNK.

class kaldi.lm.ConstArpaLm

Constant ARPA language model.

get_bos_symbol() → int

Returns the label for <s> symbol.

get_eos_symbol() → int

Returns the label for </s> symbol.

get_ngram_log_prob(word:int, hist:list<int>) → float

Gets the n-gram log probability for the word given the history.

It first maps possible out-of-vocabulary words to <unk>, if <unk> is defined, and then computes the n-gram log probability.

get_ngram_order() → int

Returns the n-gram order.

get_unk_symbol() → int

Returns the label for <unk> symbol.

history_state_exists(hist:list<int>) → bool

Checks if history states exists in the model.

Returns:True if history word sequence <hist> has a successor in the model, which means <hist> will be a state in the FST format language model.
read(is:istream, binary:bool)

Reads the model from an input stream.

write(os:ostream, binary:bool)

Writes the model to an output stream.

write_arpa(os:ostream)

Writes the model to an output stream as an ARPA file.

class kaldi.lm.ConstArpaLmDeterministicFst

Deterministic on demand constant ARPA language model.

final(state:int) → TropicalWeight

Returns the final weight of the given state.

get_arc(s:int, ilabel:int) -> (success:bool, oarc:StdArc)

Creates an on demand arc and returns it.

Parameters:
  • s (int) – State index.
  • ilabel (int) – Arc label.
Returns:

The created arc.

start() → int

Returns the start state index.

class kaldi.lm.KaldiRnnlmWrapper

Mikolov RNNLM wrapper.

get_eos() → int

Returns the label for EOS symbol.

get_hidden_layer_size() → int

Returns the size of hidden layer.

get_log_prob(word:int, wseq:list<int>, context_in:list<float>) -> (logprob:float, context_out:list<float>)

Computes the log probability of the word given the history.

Computes the log probability of the word given the history and the initial context vector. Returns the log probability and the final context vector.

class kaldi.lm.KaldiRnnlmWrapperOpts

Options for wrapping a Mikolov RNNLM.

eos_symbol

EOS symbol, e.g. </s>.

register(opts:OptionsItf)

Registers options with an object implementing the options interface.

Parameters:opts (OptionsItf) – An object implementing the options interface. Typically a command-line option parser.
unk_symbol

Unknown symbol, e.g. <unk>.

class kaldi.lm.RnnlmDeterministicFst

Deterministic on-demand Mikolov RNNLM FST.

final(state:int) → TropicalWeight

Returns the final weight of the given state.

get_arc(s:int, ilabel:int) -> (success:bool, oarc:StdArc)

Creates an on demand arc and returns it.

Parameters:
  • s (int) – State index.
  • ilabel (int) – Arc label.
Returns:

The created arc.

start() → int

Returns the start state index.

kaldi.lm.build_const_arpa_lm(options:ArpaParseOptions, arpa_rxfilename:str, const_arpa_wxfilename:str) → bool

Builds a constant ARPA language model from an ARPA language model.

Reads in an ARPA format language model, converts it into a constant ARPA language model and writes it out in binary format.