What is the abbreviation list provided by Spacy out of the box ? #11196
Answered
by
polm
pulkitmehtawork
asked this question in
Help: Other Questions
-
When we do sentence segmentation using spacy , we often come across abbreviations and it doesn't break them . Can you please guide how to get all abbreviation inside Spacy ? |
Beta Was this translation helpful? Give feedback.
Answered by
polm
Jul 26, 2022
Replies: 1 comment
-
The way the tokenizer works is somewhat complicated, so there's not just a list of abbreviations anywhere, but the related data is in If you want to customize the tokenizer, see the tokenization docs. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
pulkitmehtawork
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The way the tokenizer works is somewhat complicated, so there's not just a list of abbreviations anywhere, but the related data is in
tokenizer_exceptions.py
. That link is for English, but there are different settings for different languages.If you want to customize the tokenizer, see the tokenization docs.