0.1.0-alpha2
Pre-release
Pre-release
·
1457 commits
to master
since this release
Incremental changes:
- Improved documentation
- Automatically extract code and version from source files.
- Added regular expression based parser
- This provides an alternative to the CRF-based parser, but is more limited
- Improvements to labelling of New York Times dataset
- Label size modifiers for unit as part of the unit e.g. large clove, small bunch
- Consistent labelling of "juice of..." variants
- Consistent labelling of "chopped"
- Consistent labelling of "package"
- Reduce number of token labelled as OTHER because they were missing from the label
- Fixes and improvements to pre-processing input sentences
- Expand list of units to be singularised
- Fix the preprocessing incorrectly handling words with different cases
- Improve matching and replacement of string numbers e.g. one -> 1
- Fix unicode fraction replacement not replacing
- Improvements to post-processing the model output
- Pluralise units if the quantity is not singular
- Start adding tests to PreProcessor class methods