0.1.0-beta8

Pre-release

Pre-release

strangetom released this 27 Jan 11:16

· 900 commits to master since this release

6f5f230

General

Support Python 3.12

Model

Include more training data, expanding the Cookstr and BBC data by 5,000 additional sentences each
Change how the training data is stored. An SQLite database is now used to store the sentences and their tokens and labels. This fixes a long standing bug where tokens in the training data would be assigned the wrong label. csv exports are still available.
Discard any sentences containing OTHER label prior to training model, so a parsed ingredient sentence can never contain anything labelled OTHER.

Processing

Remove other field from ParsedIngredient return from parse_ingredient function.
Added text field to IngredientAmount. This is auto-generated on when the object is created and proves a human readable string for the amount e.g. "100 g"
Allow SINGULAR flag to be set if the amount it's being applied to is in brackets
Where a sentence has multiple related amounts e.g. 14 ounce (400 g) , any flags set for one of the related amounts are applied to all the related amounts
Rewrite the tokeniser so it doesn't require all handled characters to be explicitly stated
Add an option to parse_ingredient to discard isolated stop words that appear in the name, comment and preparation fields.
IngredientAmount.amount elements are now ordered to match the order in which they appear in the sentence.
Initial support for composite ingredient amounts e.g. 1 lb 2 oz is now consider to be a single CompositeIngredientAmount instead of two separate IngredientAmount.
- Further work required to handle other cases such 1 tablespoon plus 1 teaspoon.
- This solution may change as it develops

Assets 2