Skip to content
Discussion options

You must be logged in to vote

Hi @wallybeamm!

I think the key to solving your problem is here:

So, I customized the parse_data.py and generated a '.spacy' file.

This .spacy file should, for each of your documents, contain the entities in doc.ents and the relations in doc._.rel. Your approach to modifying parse_data.py is the right way to go about this: basically you need to adjust this script so it parses your Doccano files instead of the Prodigy format.

I understand that in the Doccano format, you don't immediately have access to the token indices (token_start and token_end). However, if you're creating the Doc object and setting the entities in doc.ents via their character indices start_offset and end_offset

entit…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@wallybeamm
Comment options

Answer selected by wallybeamm
Comment options

You must be logged in to vote
1 reply
@svlandeg
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / rel Feature: Relation Extractor
2 participants