Skip to content
Discussion options

You must be logged in to vote

Hello,
Yes, you are right.
The Segmentation step tries to break down long documents/sentences into smaller chunks while maintaining their context:

"This is great for joint pain, but it also causes rashes" -> "This is great for joint pain", "It also causes rashes"

The segmented chunks make it easier for the textcat to predict the sentiment. As you already mentioned, what if the doc contains multiple entities with multiple sentiments even after segmentation? For this case, we try to use blinding:

"This is great for joint pain, rashes but not arthritis"
-> "This is great for <CONDITION>, rashes but not arthritis" (POSITIVE)
-> "This is great for  joint pain, <CONDITION> but not arthritis" (…

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
1 reply
@shrinidhin
Comment options

Answer selected by svlandeg
Comment options

You must be logged in to vote
4 replies
@thomashacker
Comment options

@shrinidhin
Comment options

@thomashacker
Comment options

@shrinidhin
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / textcat Feature: Text Classifier feat / training Feature: Training utils, Example, Corpus and converters
2 participants