Error converting tsv into spacy format #10574
-
Hello, I currently find an error in the conversion of a tsv format to json to later pass it to the spacy format. The error found is the following, does anyone know how to solve it?: Traceback (most recent call last):
File "C:\Users\\anaconda3\envs\nlp\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\\anaconda3\envs\nlp\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\spacy\__main__.py", line 4, in <module>
setup_cli()
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\spacy\cli\_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\click\core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\click\core.py", line 1053, in main
rv = self.invoke(ctx)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\click\core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\click\core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\click\core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\spacy\cli\convert.py", line 74, in convert_cli
converter = _get_converter(msg, converter, input_path)
File "C:\Users\\anaconda3\envs\nlp\lib\site-packages\spacy\cli\convert.py", line 257, in _get_converter
input_data = file_.read()
File "C:\Users\\anaconda3\envs\nlp\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 200: invalid continuation byte |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @JessitaMS , next time please avoid pasting screenshots so that it's easier to view and check. As for your question, the formatting of your file looks different (e.g., there are some empty lists, etc.), you might want to compare that with the example data that we have to ensure that the converters can read them. Perhaps that's the one causing the error? In any case, you can also write a script that converts that specific file into a DocBin, then transform that into a serialized spaCy format. Based from your example, it seems that you already have the relevant information to save that as a Doc. |
Beta Was this translation helpful? Give feedback.
Hi @JessitaMS , next time please avoid pasting screenshots so that it's easier to view and check.
As for your question, the formatting of your file looks different (e.g., there are some empty lists, etc.), you might want to compare that with the example data that we have to ensure that the converters can read them. Perhaps that's the one causing the error?
In any case, you can also write a script that converts that specific file into a DocBin, then transform that into a serialized spaCy format. Based from your example, it seems that you already have the relevant information to save that as a Doc.