0.1.0-beta4
Pre-release
Pre-release
·
1243 commits
to master
since this release
- Include new source of training data: cookstr.
- 10,000 additional ingredient sentences from the archive of 7918 recipes (~40,000 total ingredient sentences) found at https://archive.org/details/recipes-en-201706 are now used in the training of the model.
- The parse_ingredient function now returns a
ParsedIngredientdataclass instead of a dict.- Remove dependency on typing_extensions as a result of this
- A model card is now provided that gives details about how the model was trained, performs, is intended to be used, and limitations.
- The model card is distributed with the package and there is a function
show_model_card()that will open the model card in the default application for markdown files.
- The model card is distributed with the package and there is a function
- Improvements to the ingredient sentence preprocessing:
- Expand the list of units
- Tweak the tokenizer to handle more punctuation
- Fix various bugs with the cleaning steps
As a result of these updates the model performance has improved to:
Sentence-level results:
Total: 12030
Correct: 10776
Incorrect: 1254
-> 89.58% correct
Word-level results:
Total: 75146
Correct: 72329
Incorrect: 2817
-> 96.25% correct