Skip to content

0.1.0-beta8

Pre-release
Pre-release

Choose a tag to compare

@strangetom strangetom released this 27 Jan 11:16
· 900 commits to master since this release

General

  • Support Python 3.12

Model

  • Include more training data, expanding the Cookstr and BBC data by 5,000 additional sentences each
  • Change how the training data is stored. An SQLite database is now used to store the sentences and their tokens and labels. This fixes a long standing bug where tokens in the training data would be assigned the wrong label. csv exports are still available.
  • Discard any sentences containing OTHER label prior to training model, so a parsed ingredient sentence can never contain anything labelled OTHER.

Processing

  • Remove other field from ParsedIngredient return from parse_ingredient function.
  • Added text field to IngredientAmount. This is auto-generated on when the object is created and proves a human readable string for the amount e.g. "100 g"
  • Allow SINGULAR flag to be set if the amount it's being applied to is in brackets
  • Where a sentence has multiple related amounts e.g. 14 ounce (400 g) , any flags set for one of the related amounts are applied to all the related amounts
  • Rewrite the tokeniser so it doesn't require all handled characters to be explicitly stated
  • Add an option to parse_ingredient to discard isolated stop words that appear in the name, comment and preparation fields.
  • IngredientAmount.amount elements are now ordered to match the order in which they appear in the sentence.
  • Initial support for composite ingredient amounts e.g. 1 lb 2 oz is now consider to be a single CompositeIngredientAmount instead of two separate IngredientAmount.
    • Further work required to handle other cases such 1 tablespoon plus 1 teaspoon.
    • This solution may change as it develops