ValueError: [E150] when using customized model #9898
-
Hi, I am working on a NER project and would kindly ask for your help. For this project, named entities in text snippets taken from invoices have to be recognized, such as the invoice's date or id. This works quite well, but the default tokenizer from the "de" module (as well as the "en" one) does not isolate all tokens as required. There are two cases so far, exemplified by the following code:
What I need is to isolate an entity even if a matching closing parenthesis is missing or if two entities are separated by a colon, like this:
Since this project is supposed to work with the project / config workflow of SpaCy 3, I set about to create
This file works during the conversion of my labeled data to SpaCy's binary format with
...but when I incorporate into the SpaCy workflow, I encounter this error message:
The
And the cli command, which throws the error is this:
The default "de" model is referenced nowhere in my code, so I am surprised to see it mentioned in the error message. Do you have any idea what is happening here? If you do, I would be grateful to hear from you. Best regards. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I think the default As an alternative, you can customize the tokenizer for https://spacy.io/usage/training#custom-tokenizer It's simpler overall, also when packaging the pipeline later, if you don't have a custom language. Just for modifying the tokenizer settings, it shouldn't be necessary to have a custom language. |
Beta Was this translation helpful? Give feedback.
I think the default
de
is coming from the vectors inde_core_news_lg
. I'll take a look at what's going on...As an alternative, you can customize the tokenizer for
de
/German
as described here instead:https://spacy.io/usage/training#custom-tokenizer
It's simpler overall, also when packaging the pipeline later, if you don't have a custom language. Just for modifying the tokenizer settings, it shouldn't be necessary to have a custom language.