Is iterating through a doc a bad way to manually set my labels? #9366
-
I couldn't formulate the title in a better way, I'm really sorry. Basically, the way I've been generating all my train data for my NER model is through a Python script that uses The thing is, is there a problem with working like that? All the tutorials I have seen (mostly from SpaCy 2.x) used I'm still trying to get the hang of things in SpaCy, so I'm sorry if the questions doesn't make that much sense or if I made a lot of assumptions that don't make sense at all and bad usage of the technical-lingo. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Hi @Vfgandara ,
I'm curious as to what kind of errors you're getting when calling
For
Just a small correction, when pipes are frozen, they're not affecting training (nor being affected by it). You can check more in the training docs. |
Beta Was this translation helpful? Give feedback.
Hi @Vfgandara ,
I'm curious as to what kind of errors you're getting when calling
doc.set_ents()
. Just be careful as it sounds like there are overlapping spans in the dataset/annotations. If you can paste a traceback, I'd appreciate that!For
set_ents()
vs.char_span
, there shouldn't be any effect in the accuracy if that's what you meant by damaging. Although if you're just using the tokenizer, I'd suggest starting off from a…