Skip to content
Discussion options

You must be logged in to vote

I think the default de is coming from the vectors in de_core_news_lg. I'll take a look at what's going on...

As an alternative, you can customize the tokenizer for de / German as described here instead:

https://spacy.io/usage/training#custom-tokenizer

It's simpler overall, also when packaging the pipeline later, if you don't have a custom language. Just for modifying the tokenizer settings, it shouldn't be necessary to have a custom language.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@tobias-hahn
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer feat / config Feature: Training config
2 participants