error in converting from conllu to spacy format #11472
-
Hi, I was converting a conllu file to spacy by python -m spacy convert The conllu file does not seem to have problems at all, but I kept having: ValueError: not enough values to unpack (expected 10, got 1)
Just to give an example about the conllu file:
I have got exactly ten items on each line. I verified that in python by the following codes. Nothing was printed. I assume that means every line has ten items.
I don't understand why bash keeps telling me that it only gets one item. Thank you so much if any of you could help! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Please use code blocks instead of screenshots for code and text examples. Can you paste or attach one sentence that fails with the converter? It could possibly be an encoding error, or another common error is that the format requires a blank line after the last sentence. You can validate the format with the UD validation tools (https://github.com/universaldependencies/tools) just with |
Beta Was this translation helpful? Give feedback.
-
Hi Adrianne, Thanks so much for replying!! This is the first time I have asked a question on Github. Sorry about the formatting. I tried validation.py and I have got more than 1000 errors.
Do I have to correct every single one of them to be able to convert? This is not my own dataset, so I am not sure to what extent I should work on rectifying it. Thank you! |
Beta Was this translation helpful? Give feedback.
-
I think it's mainly the format errors that are the problem, but invalid The validate script truncates the error output by default, but there should be details about the first few errors of each type, so check whether it's something systematic that's easy to fix. I suspect that the metadata errors aren't important in terms of use with the spacy converter, but it's hard to know without seeing the data or the errors. It might only be a matter of fixing some tabs or some typos in |
Beta Was this translation helpful? Give feedback.
-
Thanks so much! I will try to fix them! |
Beta Was this translation helpful? Give feedback.
Please use code blocks instead of screenshots for code and text examples.
Can you paste or attach one sentence that fails with the converter? It could possibly be an encoding error, or another common error is that the format requires a blank line after the last sentence.
You can validate the format with the UD validation tools (https://github.com/universaldependencies/tools) just with
validate.py --level 2
for the CoNLL-U format validation only.