Extending the Sentencizer | Custom rules #9913
-
Hi SpaCy Experts, We have tested and compared the default sentencizer (parser), senter and SentenceRecognizer. Please advise how we can extend the rule-based Sentencizer with the custom logic mentioned above. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
If you already have working logic that includes rules, instead of using the Sentencizer you can create a small custom component that assigns
Note that if you're concerned about cases like that, you usually want a statistical model to handle ambiguous cases like "He works for Stuff Inc. I don't.", where an abbreviation is also an end of a sentence. |
Beta Was this translation helpful? Give feedback.
If you already have working logic that includes rules, instead of using the Sentencizer you can create a small custom component that assigns
is_sent_start
to all tokens in a Doc. The sentencizer is only for very simple punctuation based tokenization.Note that if you're concerned about cases like that, you usually want a statistical model to handle ambiguous cases like "He works for Stuff Inc. I don't.", where an abbreviation is also an end of a sentence.