0.1.0-beta11

Pre-release

Pre-release

strangetom released this 27 May 16:43

· 659 commits to master since this release

3a16425

General

Refactor package structure to make it more suitable for expansion to over languages.

Note: There aren't any plans to support other languages yet.

Model

Reduce duplication in training data
Introduce PURPOSE label for tokens that describe the purpose of the ingredient, such as for the dressing and for garnish.
Replace quantities with "!num" when determining the features for tokens so that the model doesn't need to learn all possible values quantities can take. This results in a small reduction in model size.

Processing

Various bug fixes to post-processing of tokens with labels NAME, COMMENT, PREP, PURPOSE, SIZE to correct punctuation and confidence calculations.
Modification of tokeniser to split full stops from the end of tokens. This helps to model avoid treating "token." and "token" as different cases to learn.
Add fallback functionality to parse_ingredient for cases where none of the tokens are labelled as NAME. This will select name as the token with the highest confidence of being labelled NAME, even though a different label has a high confidence for that token. This can be disabled by setting expect_name_in_output=False in parse_ingredient.

Assets 2