convert from conll2003 to ner-json loses tags #8738
-
How to reproduce the behaviourcommand: File must be a in a CoNLL-2003 format. BugWhen converting a file from CoNLL-2003 to an ner json file, a lot of the tags get lost in the process. Here are some snippets of the conll and ner files. In the first example, the tag gets lost, in the second one, with the same tags, the ner file maintained them: CoNLL-2003:
ner:
CoNLL-2003:
ner:
My Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
Hi, the problem is that this format doesn't support spaces in the tokens. The converter expects to be able to split the columns by whitespace and that the IOB annotation is in the same column in all lines after splitting, typically either always in the 2nd column or always in the 4th column depending on the dataset. |
Beta Was this translation helpful? Give feedback.
Hi, the problem is that this format doesn't support spaces in the tokens. The converter expects to be able to split the columns by whitespace and that the IOB annotation is in the same column in all lines after splitting, typically either always in the 2nd column or always in the 4th column depending on the dataset.