Further classifying NER based on content #11565
Replies: 1 comment 2 replies
-
It sounds like you should extract the entities, treat them as separate Docs, and run a textcat over them. This would involve a separate pipeline. However, since the resulting Docs will be very short, it might be more effective to use a rule-based solution if you have access to a good food database; FoodData by the US FDA has good information, though it's not quite the same as the tags you suggest. Note that classifying short text in general is a tricky problem. If you want to tie the information back into the parent pipeline, you could set the labels resulting from the textcat to a custom attribute (underscore attribute) on the entity spans. You could do this entirely in a component in the main pipeline, with a sub-pipeline contained in the component, or you could do it as a post-processing step outside the pipeline. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi together,
I use a custom trained NER to detect food_items/dishes in sentences, which is performing well.
doc = nlp('As a main course I will serve puff pastry with mushroom filling, followed by raspberry pastries as dessert.')
The pipeline correctly identifies ['puff pastry with mushroom filling', 'raspberry pastries'] as food_item and with
ent[0].vector
I can also obtain the averaged vectors. Knowing about the problems that averaged token2vec vectors have, I would still like to further use them now to classify the extracted entities into one or multiple categories - e.g. 'sweet', 'hearty', 'vegetarian', 'vegan', ... Ideally, I would like to do this as a further step in the pipeline and then somehow tie it to the NER to call it later on. How would I do that?
Beta Was this translation helpful? Give feedback.
All reactions