Spacy and working with fractions as text #9442
-
I am working on an application that reads culinary recipes. I have discovered I need to work around how spaCy normally tags food, units of measurement and numbers. I have used Entity Rulers to resolve the first two issues with no problems. I am really pleased with the accuracy. The problem lies in how spaCy tags fractions. It recognizes some numbers in recipes as QUANTITIES (e.g. "2 1/2 teaspoons" as a single entity) as some as CARDINALS (e.g. 2 cups, recognized separately as a "CARDINAL" plus a noun. I tried creating a special entity ruler to recognize all numbers and fractions "amts" as a new entity. But this does not work. The slash symbol "" in the textual representation of a fraction seems to force spaCy to recognize the numerators and denominators just as separate numbers. I played with using escape characters "/" knowing full well that probably would not work. I was right about that at least. So my question is... now that I have entity rulers that recognize ingredients, and units of measurement accurately as ents, how can I ask spaCy to recognize fractions as they appear in many recipes? I thought maybe I could do this using token.pos_ "NUM," but it doesn't work either. Any suggestions? Thanks in advance. Robert |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I'd like to see how you're writing the Entity Rule for this and how you're adding the patterns to your pipeline, so I know what you've tried so far. In the examples I've trying, numbers with slashes are consistently recognized as tokens, even when the phrase as a whole isn't recognized as a QUANTITY by the default NER. This should allow you to write an Entity Rule that matches a NUM followed by certain NOUNs, and labels them as QUANTITY. For example, this example has no problem with the slash:
The forward slash ( I haven't come across any examples where the numerator and denominator come out as separate NUM tokens, which is why I'd like to see an example of what you've tried. |
Beta Was this translation helpful? Give feedback.
I'd like to see how you're writing the Entity Rule for this and how you're adding the patterns to your pipeline, so I know what you've tried so far.
In the examples I've trying, numbers with slashes are consistently recognized as tokens, even when the phrase as a whole isn't recognized as a QUANTITY by the default NER. This should allow you to write an Entity Rule that matches a NUM followed by certain NOUNs, and labels them as QUANTITY.
For example, this example has no problem with the slash:
The forward slash (
/
) doe…