KeyError when training the tagger #10590
-
I am presenting the following KeyError: '' when I am going to train the model with the data that I provide. The data has the following format: I then convert them to iob, then to spacy. And when it is going to execute it gives me this error, I don't know why it is not taking the POS values that go in the input dataset. I have found another issue similar to this one, for now no link of what is provided is active, for this reason I have reopened the discussion |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 28 replies
-
What is the text of your error? What command did you run that produced the error? What does your data actually look like? Please give an example, it's not clear from the headers you posted. How do you convert your data to IOB and then spaCy? If there is a similar issue, please link to it. |
Beta Was this translation helpful? Give feedback.
-
What is the text of your error? What command did you run that produced the error? File "spacy\pipeline\tagger.pyx", line 205, in spacy.pipeline.tagger.Tagger.update What does your data actually look like? Please give an example, it's not clear from the headers you posted.
0 La DET O How do you convert your data to IOB and then spaCy? #Convert into iob format #Convert into spacy format If there is a similar issue, please link to it. |
Beta Was this translation helpful? Give feedback.
-
Are you running spaCy from the command line? If so - can you provide the command you ran? If not - can you provide a minimal code snippet that reproduces the error? |
Beta Was this translation helpful? Give feedback.
-
This is all the code I have, when the mentioned error appears:
|
Beta Was this translation helpful? Give feedback.
-
I suspect that what is happening here is that a value in your CSV (probably Tag) is missing/blank, which is why the KeyError is for an empty string. I would check your original CSV to see if that's the case for any entries. |
Beta Was this translation helpful? Give feedback.
-
Hi @polm: I send you a mini sample of the dataset I have tried to do what you tell me to take only a mini piece of the dataset but it keeps doing the same thing. I hope now that with this data set, we can actually find where the problem lies. |
Beta Was this translation helpful? Give feedback.
-
I don't know if I've mentioned it before @polm , but all this creation of the dataset with POS and NER Tag is because I would like to have a model like es_core_news_lg., hence in the pipeline use tagger |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot @adrianeboyd, I have found the error and it was precisely what he commented on in the first message he wrote me, the dataset had header and index and that is why it was not able to be processed correctly. Once this was fixed, the model was able to train properly. I didn't write to youbefore, because I was finishing testing it to work. However, the only thing that strikes me is that the NER is executed perfectly, but the dependencies from the POS given in the dataset, the system is not able to paint it once I do I don't know why spacy not paint the POS relations?, Do you have any suggestion? |
Beta Was this translation helpful? Give feedback.
-
Now I have another doubt @adrianeboyd I understand that to train the model, the more text it has, the better it will be able to learn the labels, but for example with the model es_core_news_lg, I have seen that PERSON labels many entities that actually corresponds to this label, but I understand that it may not have in its data all the names that I work with in my dataset. So my question is, why with the data and labels that my model trained, is it not able to learn to associate another name with which it has not been trained, I do not know that it should be able to learn, without specifically putting all the cases? . If you can help me understand this a bit, since I'm very new to the subject, thank you! |
Beta Was this translation helpful? Give feedback.
Thanks a lot @adrianeboyd, I have found the error and it was precisely what he commented on in the first message he wrote me, the dataset had header and index and that is why it was not able to be processed correctly. Once this was fixed, the model was able to train properly. I didn't write to youbefore, because I was finishing testing it to work.
However, the only thing that strikes me is that the NER is executed perfectly, but the dependencies from the POS given in the dataset, the system is not able to paint it once I do
displacy.serve(doc, style="dep")
I don't know why spacy not paint the POS relations?, Do you have any suggestion?