Improving clause segmentation in the textcat component to get complete information from text #10365

shrinidhin · 2022-02-24T07:35:11Z

shrinidhin
Feb 24, 2022

Hello!I have been using the approach used in Healthsea by spacy for a project. While the clause segmentation logic works for short summarized use cases where everything is condensed in single sentences, I think it can use some improvement in the segmentation logic for information spread across multiple sentences, Consider the following example:

Nifty 50 has opened at 17,600 today. The index has shown a remarkable jump of 5 percent from the previous day.

So using the current logic, If Nifty 50 is the entity here, The Benepar parser splits the text into sentences.

Sent 1: Nifty 50 has opened at 17,600 today.
Sent 2: The index has shown a remarkable jump of 5 percent from the previous day.

Now according to sentence wise sentiment, Sent 1 will be neutral for entity Nifty 50. Even though Sent 2 gives us the actual information that the index has actually jumped up, which shows a positive sentiment for the index, that information is not being captured by the current logic since sentence 2 doesn't have the entity name in it. I thought of using co-reference resolution for it but I doubt if it will be helpful here. As per my understanding, co-reference uses pronouns to resolve the sentences. I am looking for any inputs or suggestions that would be helpful. Thank you!

Answered by thomashacker

Feb 28, 2022

Hello,
you're exactly right about this problem. In Healthsea, whenever the model classified an entity to be "Anamnesis" (or in your case "Neutral") we create a temporary cache that collects the entity and tries to pair it with sentiments detected in the following sentences that don't include any entities.

Nifty 50 has opened at 17,600 today. -> (Nifty 50, Neutral)
The index has shown a remarkable jump of 5 percent from the previous day. -> (None, Positive)

(Nifty 50, Neutral) -> (None, Positive) -> (Nifty 50, Postive)

Since this is a rule-based approach, it has its own limitations, which is why I'd be really interested to see how well co-reference resolution would work in this case.

View full answer

thomashacker · 2022-02-28T11:24:18Z

thomashacker
Feb 28, 2022

Hello,
you're exactly right about this problem. In Healthsea, whenever the model classified an entity to be "Anamnesis" (or in your case "Neutral") we create a temporary cache that collects the entity and tries to pair it with sentiments detected in the following sentences that don't include any entities.

Nifty 50 has opened at 17,600 today. -> (Nifty 50, Neutral)
The index has shown a remarkable jump of 5 percent from the previous day. -> (None, Positive)

(Nifty 50, Neutral) -> (None, Positive) -> (Nifty 50, Postive)

Since this is a rule-based approach, it has its own limitations, which is why I'd be really interested to see how well co-reference resolution would work in this case.

3 replies

shrinidhin Feb 28, 2022
Author

@thomashacker Thank you very much!I believe there is a co-reference resolution component in development by spacy. Is there any update on when it will be released?
Though there is content on implementation of co-reference resolution available, It will be easier and quicker to include a spacy component of co-reference resolution into an existing pipeline.

thomashacker Mar 2, 2022

Yes, it's currently in development. We're aiming for soon but don't have a specific date yet.

shrinidhin Mar 4, 2022
Author

Alright.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improving clause segmentation in the textcat component to get complete information from text #10365

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Improving clause segmentation in the textcat component to get complete information from text #10365

Uh oh!

shrinidhin Feb 24, 2022

Replies: 1 comment · 3 replies

Uh oh!

thomashacker Feb 28, 2022

Uh oh!

shrinidhin Feb 28, 2022 Author

Uh oh!

thomashacker Mar 2, 2022

Uh oh!

shrinidhin Mar 4, 2022 Author

shrinidhin
Feb 24, 2022

Replies: 1 comment 3 replies

thomashacker
Feb 28, 2022

shrinidhin Feb 28, 2022
Author

shrinidhin Mar 4, 2022
Author