Implementing SpanRuler to train data on - weak supervision #12391
-
Hi, I have been exploring Spacy with manual annotation for most part and would like to switch to weak supervision with SpanRuler as this seems to fit my use-case the best, while keeping only a certain part of my datased manually annotated. I have about 1000 contracts containing 19 clause types of interest. Most contracts should contain at least 10 of such clauses. For each clause, I have between 4-8 patterns that will locate and annotate the clause within the document. I will use this, and only this data to then train my blank model. For practice purposes, I tried the following six patterns to extract and annotate contract clauses mentioning governing law:
My questions are:
I understand that this is a beginner question but I keep reading the documentation back and forth and googling for similar projects and/or issues but still can't comprehend the above clearly, so any explanation will help greatly :) Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey, thanks for your question! All information generated by the individual components is saved in the Looking at your code, there are some minor issues. I would recommend watching videos like these first to get a better understanding of how matching works in spaCy + having some nice examples to refer back to: https://www.youtube.com/watch?v=BXzFAjtenHM&ab_channel=Explosion About the training, could you provide us with more information about your |
Beta Was this translation helpful? Give feedback.
Hey, thanks for your question!
All information generated by the individual components is saved in the
doc
object. The SpanRuler adds all extracted spans to thedoc.spans
attribute. You can read more about this in our docs about processing pipelines. See here how to create a DocBin object with a list of docs, you can then save theDocBin
as a.spacy
file to use it for training/testing.Looking at your code, there are some minor issues. I would recommend watching videos like these first to get a better understanding of how matching works in spaCy + having some nice examples to refer back to:
https://www.youtube.com/watch?v=BXzFAjtenHM&ab_channel=Explosion
https://www.youtube.com/watch?v=1Un…