Problems migrating from 2.2.4 to Current Version - Misalignment and Training Loop #11303
Replies: 1 comment
-
It's not surprising that you might run into some changes in tokenization between v2 and v3, but periods at the end of sentence with normal-looking words shouldn't be a general problem - the latest tokenizer doesn't attach the period to "West" for me with your example sentence. Could you check your data again to be sure that's the problem? If you have a lot of alignment errors, and it's safe to handle them in a uniform way, you can use the
NER doesn't benefit from context more than like a paragraph, and working with very long documents can cause a variety of other issues, so it would probably be eaiser to work with your data if you could break your documents up on paragraph or page boundaries.
Sorry, I don't understand. Is an "interaction" an iteration or something else? If the new training is only taking a couple of minutes that sounds like something is wrong and your data is being skipped. Like you mention about your data, you should probably fix the alignment issues first. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
My project is working with spacy 2.2.4 and uses tagged NER model with training loop.
Now I am updating it to spacy current version.
The training data contains 300 documents, some as big as 300kb.
My results range from 75% to 85%.
Problem 1 - Misalignment warnings
I used to have no problems with tagging words that finished in coma or period.
Example
"The deputy of this account is James West."
"The deputy of this account is [James West]."
Is there any configuration I should make to fix this? or should id tag the period inside the limits?
Problem 2 - Training Loop
The training loop in 2.2.4 for me, uses about 150 hours of training with incremental loop of 50 interactions.
So, the new one, using spacy file or custom loop, only take a couple of minutes?
(I can't right now test the results to compare, because of problem 1, of my data is misaligned)
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions