-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
feat / nerFeature: Named Entity RecognizerFeature: Named Entity Recognizerlang / deGerman language data and modelsGerman language data and modelsperf / accuracyPerformance: accuracyPerformance: accuracy
Description
How to reproduce the behaviour
model = spacy.load("de_core_news_lg")
# "Freundliche Grüße Nadia" (start_char: 0, end_char: 22, label_: "PER")
tokens = model("Freundliche Grüße Nadia")
# "Freundliche Grüße" (start_char: 0, end_char: 16, label_: "PER")
# "Nadia" (start_char: 32, end_char: 36, label_: MISC)
tokens = model("Freundliche Grüße meine liebste Nadia")
# "Hallo Herr Müller" (start_char: 0, end_char: 16, label_: "PER")
tokens = model("Hallo Herr Müller.")
# no entities recognized
tokens = model("Hallo Herr Müller")
In the comments above we are examining tokens.ents.
Note for English we were in a position to reproduce the documented behavior, where it would accurately identify the entity.
That being said, in English we noticed some issues with typos, however using en_core_web_trf solved those.
Unfortunately de_dep_news_trf does not support entities.
Your Environment
- Operating System: OSX
- Python Version Used: 3.11
- spaCy Version Used: 3.5.1 (de-core-news-lg)
- Environment Information:
Metadata
Metadata
Assignees
Labels
feat / nerFeature: Named Entity RecognizerFeature: Named Entity Recognizerlang / deGerman language data and modelsGerman language data and modelsperf / accuracyPerformance: accuracyPerformance: accuracy