Releases: strangetom/ingredient-parser
2.4.0
2.4.0
General
Warning
This release drops support for Python 3.10.
- Drop support for Python 3.10.
- Add support for Python 3.14.
- Require pint >= 0.25.0
Processing
- Improve the part of speech tagging accuracy by extending the built-in
tagdictin NLTK's part of speech tagger with ingredient specific entries. - Add
name_indexfield toFoundationFoodobjects. This field refers to the index of the matching name in theParsedIngredient.namelist.- The list of names and foundation foods are also guaranteed to be in the same order (although be aware that a name may not have a matching foundation food).
- Improve processing of names, particularly related to handling of punctuation at the beginning or end of the name.
2.3.0
2.3.0
Note
This release only contains changes related to the development tools for this library. There are no changes to the functionality of the library.
Development tools
-
Replace the labeller and webapp tools with a new tool ("webtools") written in react. Many thanks to @mcioffi for this contribution. Key functionality:
-
Parser, to display to parsed output of an input ingredient sentence.
-
Labeller, to edit the labelled training data or add new training data.
-
Trainer, to initiate training of models.
See the docs for more information.
-
-
When generated detailed results when model training (using
--detailed) also generate a file detailing classification results for features.
2.2.0
Foundation foods:
- Bias foundation food matching to prefer "raw" FDC ingredients, but only if the ingredient name does not include any verbs that indicate the ingredient is not raw (e.g. "cooked").
- Normalise spelling of tokens in ingredient names to align with spelling used in FDC ingredient descriptions.
- Fix a bug where foundation foods were never calculated if
separate_names=False.
General
- Add logging to library, under the
ingredient-parsernamespace.
Model
- Improve parser model performance with new features related to sentence structure, such as whether a token is part of an example phrase, a multi-ingredient phrase, or after the split in a compound sentence. See the Feature Generation of the docs for more details.
Processing
- Improve post processing of names to avoid returning multiple names if the name is split by a non-name token. For example, in the sentence "8 fresh large basil leaves", the name should be returned as "fresh basil leaves" and not as two separate names: "fresh", "basil leaves".
2.1.1
- Pin Pint version to 0.24.4, as future versions intend to drop support for Python 3.10.
2.1.0
Warning
This version replaces the floret dependency with numpy.
Numpy was already a dependency of floret, so if you are upgrading from v2.0.0 there should be little impact.
This release overhauls the foundation foods functionality so that ingredient names are matched to entries in the FoodData Central (FDC) database.
-
This update does not change the API. It adds additional fields to
FoundationFoodobjects for FDC ID, category and data type. Thetextfield now returns the description for the matching FDC entry. -
Beware that enabling this functionality causes the
parse_ingredientfunction to be much slower than when disabled (default).foundation_foods=False (default) foundation_foods=True Sentences per second ~1500 ~20 -
This functionality works entirely offline.
-
See the foundation foods page of the docs for specifics.
2.0.0
2.0.0
Caution
This release contains some breaking changes
-
ParsedIngredient.nameis now a list ofIngredientTextobjects, or an empty list no name is identified. -
The
quantity_fractionsoptional keyword argument has been removed.IngredientAmount.quantityandIngredientAmount.quantity_maxreturnfractions.Fractionobjects. Conversion tofloatcan be achieved by e.g.:# Round to 3 decimal places round(float(quantity), 3)
-
New dependency: floret.
Processing
-
Identify where multiple alternative ingredients are given for the stated amount. For example
# Simple example >>> parse_ingredient("2 tbsp butter or olive oil").name [ IngredientText(text='butter', confidence=0.983045, starting_index=2), IngredientText(text='olive oil', confidence=0.930385, starting_index=4) ] # Complex example >>> parse_ingredient("2 cups chicken or beef stock").name [ IngredientText(text='chicken stock', confidence=0.776891, starting_index=2), IngredientText(text='beef stock', confidence=0.94334, starting_index=4) ]
This is enabled by default, but can be disabled by setting
separate_ingredients=Falseinparse_ingredient. If disabled, theParsedIngredient.namefield will be listing containing a singleIngredientTextobject. -
Set
PREPARED_INGREDIENTflag on amounts in cases like... to yield 2 cups ...
-
Add
convert_to(...)function toIngredientAmountandCompositeIngredientAmountdataclasses to convert the amount to the given units. Conversion between mass and volume is also supported using a default density (density of water) that can be changed.>>> p = parse_ingredient("1 8 ounce can chopped tomatoes") >>> # Convert "8 ounce" to grams >>> p.amount[1].convert_to("g") IngredientAmount(quantity=Fraction(5669904625000001, 25000000000000), quantity_max=Fraction(5669904625000001, 25000000000000), unit=<Unit('gram')>, text='226.80 gram', confidence=0.999051, starting_index=1, APPROXIMATE=False, SINGULAR=True, RANGE=False, MULTIPLIER=False, PREPARED_INGREDIENT=False) >>> # Cannot convert where the quantity or unit is a string >>> p.amount[0].convert_to("g") TypeError: Cannot convert where quantity or unit is a string.
Model
- Include custom word embeddings as features used by the model. This requires a new dependency of the floret library.
1.3.2
Processing
- Fix bug that allowed fractions in the intermediate form (i.e.
#1$2) to appear in the name, prep, comment, size, purpose fields of theParsedIngredientoutput.
1.3.1
Warning
This version requires pint >=0.24.4
General
- Support Python 3.13. Requires pint >= 0.24.4.
1.3.0
Processing
-
Various minor improvements to feature generation.
-
Add PREPARED_INGREDIENT flag to IngredientAmount objects. This is used to indicate if the amount refers to the prepared ingredient (
PREPARED_INGREDIENT=True) or the unpreprared ingredient (PREPARED_INGREDIENT=False). -
Add
starting_indexattribute to IngredientText objects, indicating the index of the token that starts the IngredientText. -
Improve detection of composite amounts in sentences.
-
Add
quantity_fractionskeyword argument toparse_ingredient. When True, thequantityandquantity_maxfields ofIngredientAmountobjects will befractions.Fractionobjects instead of floats. This allows fractions such as 1/3 to be represented exactly. The default behaviour is whenquantity_fractions=False, where quantities are floats as previously. For example>>> parse_ingredient("1 1/3 cups flour").amount[0] IngredientAmount( quantity=1.333, quantity_max=1.333, unit=<Unit('cup')>, text='1 1/3 cups', ... ) >>> parse_ingredient("1 1/3 cups flour", quantity_fractions=True).amount[0] IngredientAmount( quantity=Fraction(4, 3), quantity_max=Fraction(4, 3), unit=<Unit('cup')>, text='1 1/3 cups', ... )
Model
- Addition of new dataset: tastecooking. This is a relatively small dataset, but includes a number of unique abbreviations for units and sizes.
1.2.0
General
-
New optional keyword argument to extract foundation foods from the ingredient name. Foundation foods are the fundamental item of food, excluding any qualifiers or descriptive adjectives, e.g. for the name
organic cucumber, the foundation food iscucumber.See https://ingredient-parser.readthedocs.io/en/latest/guide/foundation.html for additional details.
-
Some minor post processing fixes.