Training SpanCat without Prodigy? #12124
SharbelWired
started this conversation in
Language Support
Replies: 1 comment 4 replies
-
|
I'd recommend double-checking all the if span is not None:
spans.append(span)If there's something wrong with the character offsets coming from your original annotation, it's possible that no spans are being added to the training docs. You can also use If the spans are being converted correctly, the next thing to check is how many of annotated spans are covered by the suggester, which suggests 1-3-grams by default. If you have longer spans this default wouldn't be suitable for your task. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, I am trying to train a spancat model manually using the command line spacy train workflow. When I use Prodigy, I can label my text samples, highlight spans, and train via prodigy's training integrations with spacy. When I train a custom spancat model this way, everything works fine.
When I try to use annotations that were created in another labelling app (eg: Label Studio), I first tried to export in Label Studio using the Connl2003 format (following the docs suggestions => https://labelstud.io/guide/export.html#spaCy) , and using
spacy convert. This did not work well for me since it created a single document, instead of the 300+ that are actually there. So, instead, I figured I would just import the raw JSON file from Label Studio, iterate over the documents, and manually create the DocBin using the start/ends for each label in the file.I am using the following that takes in a json dict, then attempts to create a new DocBin with the spans and their corresponding labels associated with each doc.
When I train this, the results are basically 0s, but sometimes I do get marginal results. Again, when I train with Prodigy with MUCH less samples (eg only 10!) I see scores come back as expected. I must be doing something wrong when I am rebuilding the DocBin manually.. is the code below essentially what is needed to create a valid DocBin for spancat training?
This is the config that I am using, I basically used the quickstart and used the fill command after.. left things default for the most part:
Beta Was this translation helpful? Give feedback.
All reactions