WikiNeural to .spacy Format #12135
-
Hi there, I'd like to convert the WikiNeural (https://github.com/Babelscape/wikineural) Russian dataset into .spacy format. Currently it's in one-token-per-line format. I used the spacy convert built in function to convert using the options However the F1 score is extremely low (in the .01-.05 range) and I think the problem might be with how spacy reconstructs tokenized data into documents. Has anyone else had this issue? A preview of the data before conversion:
Note: The original file had 3 columns... Index, Token, Label |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
After removing the first column of token indices, training an NER model from this data seems to work as expected:
Double-check with If you're using a transformer model, double-check that it's one that's appropriate for Russian. It looks like we don't have a Russian-specific default, so |
Beta Was this translation helpful? Give feedback.
After removing the first column of token indices, training an NER model from this data seems to work as expected: