A Python library for working with weighted context-free grammars (WCFGs), weighted finite state automata (WFSAs) and weighted finite state transducers (WFSTs). The library provides efficient implementations for grammar operations, parsing algorithms, and language model functionality.
This library can be installed via pip:
pip install genlm-grammar- Support for weighted context-free grammars with various semirings (Boolean, Float, Real, MaxPlus, MaxTimes, etc.)
- Grammar transformations:
- Local normalization
- Removal of nullary rules and unary cycles
- Grammar binarization
- Length truncation
- Renaming/renumbering of nonterminals
- Earley parsing (O(n³|G|) complexity)
- Standard implementation
- Rescaled version for numerical stability
- CKY parsing
- Incremental CKY with chart caching
- Support for prefix computations
BoolCFGLM: Boolean-weighted CFG language modelCKYLM: Probabilistic CFG language model using CKYEarleyLM: Language model using Earley parsing
- Weighted FSA implementation
- Operations:
- Epsilon removal
- Minimization (Brzozowski's algorithm)
- Determinization
- Composition
- Reversal
- Kleene star/plus
- Semiring abstractions (Boolean, Float, Log, Entropy, etc.)
- Efficient chart and agenda-based algorithms
- Grammar-FST composition
- Visualization support via Graphviz
See DEVELOPING.md for information on how to install the package in development mode.
