[English] Multiple sentences grouped into one when a newline character is used #11402
-
How to reproduce the behaviour
Current outputAs you can see, multiple sentences are grouped together in the second recognized sentence by spaCy.
Expected outputWhen I use a space instead of a newline character, the output is as expected.
Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Sorry this isn't working well for you. Our pretrained components use data that doesn't contain newlines, so they don't always handle them well. We use whitespace augmentation when training to help with this, but for sentence boundaries in particular, the models have probably learned that only a period followed by a space terminates a sentence. In this case I would preprocess your text to convert newlines to spaces, or, if your needs are more complicated, train a sentence recognizer. |
Beta Was this translation helpful? Give feedback.
Sorry this isn't working well for you. Our pretrained components use data that doesn't contain newlines, so they don't always handle them well. We use whitespace augmentation when training to help with this, but for sentence boundaries in particular, the models have probably learned that only a period followed by a space terminates a sentence.
In this case I would preprocess your text to convert newlines to spaces, or, if your needs are more complicated, train a sentence recognizer.