Improving clause segmentation in the textcat component to get complete information from text #10365
-
Hello!I have been using the approach used in Healthsea by spacy for a project. While the clause segmentation logic works for short summarized use cases where everything is condensed in single sentences, I think it can use some improvement in the segmentation logic for information spread across multiple sentences, Consider the following example:
So using the current logic, If Nifty 50 is the entity here, The Benepar parser splits the text into sentences.
Now according to sentence wise sentiment, Sent 1 will be neutral for entity |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello,
Since this is a rule-based approach, it has its own limitations, which is why I'd be really interested to see how well co-reference resolution would work in this case. |
Beta Was this translation helpful? Give feedback.
Hello,
you're exactly right about this problem. In Healthsea, whenever the model classified an entity to be "Anamnesis" (or in your case "Neutral") we create a temporary cache that collects the entity and tries to pair it with sentiments detected in the following sentences that don't include any entities.
Since this is a rule-based approach, it has its own limitations, which is why I'd be really interested to see how well co-reference resolution would work in this case.