Skip to content

0.1.0-beta5

Pre-release
Pre-release

Choose a tag to compare

@strangetom strangetom released this 16 Sep 19:57
· 1124 commits to master since this release
  • Support the extraction of multiple amounts from the input sentence.
  • Change output dataclass to put confidence values with each field.
    • The name, comment, other fields are output as an IngredientText object containing the text and confidence
    • The amounts are output as an IngredientAmount object containing the quantity, unit, confidence and flags for whether the amount is approximate or for a singular item of the ingredient.
  • Rewrite post-processing functionality to make it more maintainable and extensible in the future.
  • Add a model card, which provides information about the data used to train and evaluate the model, the purpose of the model and it's limitations.
  • Increase l1 regularisation during model training.
    • This reduces model size by a factor of ~4.
    • This should improve performance on sentences not seen before by forcing to the model to rely less on labelling specific words.
  • Improve the model guide in the documentation.
  • Add a simple webapp that can be used to view the output of the parser in a more human-readable way.

Example of the output at this release

>>> parse_ingredient("50ml/2fl oz/3½tbsp lavender honey (or other runny honey if unavailable)")
ParsedIngredient(
    name=IngredientText(
        text='lavender honey',
        confidence=0.998829),
    amount=[
        IngredientAmount(
            quantity='50',
            unit='ml',
            confidence=0.999189,
            APPROXIMATE=False,
            SINGULAR=False),
        IngredientAmount(
            quantity='2',
            unit='fl oz',
            confidence=0.980392,
            APPROXIMATE=False,
            SINGULAR=False),
        IngredientAmount(
            quantity='3.5',
            unit='tbsps',
            confidence=0.990711,
            APPROXIMATE=False,
            SINGULAR=False)
    ],
    comment=IngredientText(
            text='(or other runny honey if  unavailable)',
            confidence=0.973682
    ),
    other=None,
    sentence='50ml/2fl oz/3½tbsp lavender honey (or other runny  honey if unavailable)'
)