Trailing dot handling #12930
-
Trailing dots on numbers are handled differently for English and German. English splits the trailing dot into its own token. German does not. Both do not split off the trailing dot from "m.". Not sure if this is due to some tweaks to handle cases like In our case we would however always want the I saw the example for https://spacy.io/usage/linguistic-features#tokenization for How to reproduce the behaviour
Note: The differences in the
For English I get:
For German I get:
Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I managed to fix the difference between English and German by adding after I found #7303:
|
Beta Was this translation helpful? Give feedback.
I managed to fix the difference between English and German by adding after I found #7303: