Skip to content
Discussion options

You must be logged in to vote

I managed to fix the difference between English and German by adding after I found #7303:

    suffixes = nlp.Defaults.suffixes + [r"\."]
    suffix_regex = spacy.util.compile_suffix_regex(suffixes)
    nlp.tokenizer.suffix_search = suffix_regex.search

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / de German language data and models feat / tokenizer Feature: Tokenizer
1 participant