Further classifying NER based on content #11565

DerDiego13 · 2022-09-30T16:39:13Z

DerDiego13
Sep 30, 2022

Hi together,
I use a custom trained NER to detect food_items/dishes in sentences, which is performing well.

doc = nlp('As a main course I will serve puff pastry with mushroom filling, followed by raspberry pastries as dessert.')

The pipeline correctly identifies ['puff pastry with mushroom filling', 'raspberry pastries'] as food_item and with

ent[0].vector

I can also obtain the averaged vectors. Knowing about the problems that averaged token2vec vectors have, I would still like to further use them now to classify the extracted entities into one or multiple categories - e.g. 'sweet', 'hearty', 'vegetarian', 'vegan', ... Ideally, I would like to do this as a further step in the pipeline and then somehow tie it to the NER to call it later on. How would I do that?

polm · 2022-10-03T04:49:22Z

polm
Oct 3, 2022

It sounds like you should extract the entities, treat them as separate Docs, and run a textcat over them. This would involve a separate pipeline.

However, since the resulting Docs will be very short, it might be more effective to use a rule-based solution if you have access to a good food database; FoodData by the US FDA has good information, though it's not quite the same as the tags you suggest. Note that classifying short text in general is a tricky problem.

If you want to tie the information back into the parent pipeline, you could set the labels resulting from the textcat to a custom attribute (underscore attribute) on the entity spans. You could do this entirely in a component in the main pipeline, with a sub-pipeline contained in the component, or you could do it as a post-processing step outside the pipeline.

2 replies

DerDiego13 Oct 5, 2022
Author

Thank you. I will try the sub-pipeline. A rule-based solution is difficult as you can see with the word pastry above.

polm Oct 5, 2022

I don't think you could use direct lookup with a rule-based solution, but you might be able to extract the head (span.root, which would work on entities) and use that for lookup. It would also work if you merge noun phrases first. Another option would be to use doc.similarity on the names.

Also note that it's not the same as picking labels for items, but linking entities to an external database is Named Entity Linking, and may be something you should take a look at.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Further classifying NER based on content #11565

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Further classifying NER based on content #11565

Uh oh!

DerDiego13 Sep 30, 2022

Replies: 1 comment · 2 replies

Uh oh!

polm Oct 3, 2022

Uh oh!

DerDiego13 Oct 5, 2022 Author

Uh oh!

polm Oct 5, 2022

DerDiego13
Sep 30, 2022

Replies: 1 comment 2 replies

polm
Oct 3, 2022

DerDiego13 Oct 5, 2022
Author