Skip to content

0.1.0-alpha2

Pre-release
Pre-release

Choose a tag to compare

@strangetom strangetom released this 12 Sep 18:41
· 1457 commits to master since this release

Incremental changes:

  • Improved documentation
    • Automatically extract code and version from source files.
  • Added regular expression based parser
    • This provides an alternative to the CRF-based parser, but is more limited
  • Improvements to labelling of New York Times dataset
    • Label size modifiers for unit as part of the unit e.g. large clove, small bunch
    • Consistent labelling of "juice of..." variants
    • Consistent labelling of "chopped"
    • Consistent labelling of "package"
    • Reduce number of token labelled as OTHER because they were missing from the label
  • Fixes and improvements to pre-processing input sentences
    • Expand list of units to be singularised
    • Fix the preprocessing incorrectly handling words with different cases
    • Improve matching and replacement of string numbers e.g. one -> 1
    • Fix unicode fraction replacement not replacing
  • Improvements to post-processing the model output
    • Pluralise units if the quantity is not singular
  • Start adding tests to PreProcessor class methods