Skip to content
Discussion options

You must be logged in to vote

Hi @abhishekshingadiya. As I understand it, you've used Prodigy to annotate a dataset of spans and relations, following a custom annotation scheme. You've then taken the data from Prodigy (as the above JSONL) and attempted to convert that to the spaCy format using this parse_data.py file.

Note that this parse_data.py script has not been implemented for general use. It's part of a very specific tutorial on gene annotations and biomolecular interactions. It uses variables like SYMM_LABELS and MAP_LABELS that won't be relevant for your use-case. Further, at the end of the script it actually splits the dataset into train/dev/set datasets according to an article_id in example["meta"]["source"]

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / ner Feature: Named Entity Recognizer feat / rel Feature: Relation Extractor
2 participants