German NER transformer: is this possible? 🤔 #7534
-
Hey there, I am training a German NER model with transformer configuration, now I am having trouble understanding the CLI output
has not the F-score improved a bit too much from E0 to E1? I wonder if this indicates that I did something awfully wrong? btw the training really took a long time (about 48 hours on mbp, I did not use GPU, my training data was CoNLL-2003). Comments would be appreciated! Cheers, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Training Transformers on CPU is going to be extremely slow (mentioned in the docs) and is not recommended. If you don't have access to a GPU I would recommend training with one of the non-transformers models first to establish a baseline. It's hard to say anything about the F score without knowing more about your data. How many samples do you have? How difficult are they to learn? |
Beta Was this translation helpful? Give feedback.
-
I personally don't think so, no. The E0 is just a fully randomized network. 81% is definitely a high F for E1, but perhaps your entities are very straightforward to learn? Oh I see Paul has hinted towards a similar explanation, so yes it definitely does depend on your data - in your case I'd be quite happy ;-) (but ofcourse if you don't trust the results, it's always good to take the model, make some predictions on unseen text and eyeball them!) |
Beta Was this translation helpful? Give feedback.
Training Transformers on CPU is going to be extremely slow (mentioned in the docs) and is not recommended. If you don't have access to a GPU I would recommend training with one of the non-transformers models first to establish a baseline.
It's hard to say anything about the F score without knowing more about your data. How many samples do you have? How difficult are they to learn?