What is the abbreviation list provided by Spacy out of the box ? #11196

pulkitmehtawork · 2022-07-25T12:50:43Z

pulkitmehtawork
Jul 25, 2022

When we do sentence segmentation using spacy , we often come across abbreviations and it doesn't break them . Can you please guide how to get all abbreviation inside Spacy ?

Answered by polm

Jul 26, 2022

The way the tokenizer works is somewhat complicated, so there's not just a list of abbreviations anywhere, but the related data is in tokenizer_exceptions.py. That link is for English, but there are different settings for different languages.

If you want to customize the tokenizer, see the tokenization docs.

View full answer

polm · 2022-07-26T04:15:44Z

polm
Jul 26, 2022

The way the tokenizer works is somewhat complicated, so there's not just a list of abbreviations anywhere, but the related data is in tokenizer_exceptions.py. That link is for English, but there are different settings for different languages.

If you want to customize the tokenizer, see the tokenization docs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is the abbreviation list provided by Spacy out of the box ? #11196

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What is the abbreviation list provided by Spacy out of the box ? #11196

Uh oh!

pulkitmehtawork Jul 25, 2022

Replies: 1 comment

Uh oh!

polm Jul 26, 2022

pulkitmehtawork
Jul 25, 2022

polm
Jul 26, 2022