0.1.0-beta8
Pre-release
Pre-release
·
900 commits
to master
since this release
General
- Support Python 3.12
Model
- Include more training data, expanding the Cookstr and BBC data by 5,000 additional sentences each
- Change how the training data is stored. An SQLite database is now used to store the sentences and their tokens and labels. This fixes a long standing bug where tokens in the training data would be assigned the wrong label. csv exports are still available.
- Discard any sentences containing OTHER label prior to training model, so a parsed ingredient sentence can never contain anything labelled OTHER.
Processing
- Remove
otherfield fromParsedIngredientreturn fromparse_ingredientfunction. - Added
textfield toIngredientAmount. This is auto-generated on when the object is created and proves a human readable string for the amount e.g. "100 g" - Allow SINGULAR flag to be set if the amount it's being applied to is in brackets
- Where a sentence has multiple related amounts e.g.
14 ounce (400 g), any flags set for one of the related amounts are applied to all the related amounts - Rewrite the tokeniser so it doesn't require all handled characters to be explicitly stated
- Add an option to
parse_ingredientto discard isolated stop words that appear in the name, comment and preparation fields. IngredientAmount.amountelements are now ordered to match the order in which they appear in the sentence.- Initial support for composite ingredient amounts e.g.
1 lb 2 ozis now consider to be a singleCompositeIngredientAmountinstead of two separateIngredientAmount.- Further work required to handle other cases such
1 tablespoon plus 1 teaspoon. - This solution may change as it develops
- Further work required to handle other cases such