German NER transformer: is this possible? 🤔 #7534

JINHXu · 2021-03-22T16:18:09Z

JINHXu
Mar 22, 2021

Hey there,

I am training a German NER model with transformer configuration, now I am having trouble understanding the CLI output

E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  -------------  --------  ------  ------  ------  ------
  0       0        9099.78    551.50    3.08    2.02    6.46    0.03
  1     200      510644.36  58904.13   80.95   85.78   76.64    0.81

has not the F-score improved a bit too much from E0 to E1? I wonder if this indicates that I did something awfully wrong? btw the training really took a long time (about 48 hours on mbp, I did not use GPU, my training data was CoNLL-2003). Comments would be appreciated!

Cheers,
JX

Answered by polm

Mar 23, 2021

Training Transformers on CPU is going to be extremely slow (mentioned in the docs) and is not recommended. If you don't have access to a GPU I would recommend training with one of the non-transformers models first to establish a baseline.

It's hard to say anything about the F score without knowing more about your data. How many samples do you have? How difficult are they to learn?

View full answer

polm · 2021-03-23T03:31:56Z

polm
Mar 23, 2021

Training Transformers on CPU is going to be extremely slow (mentioned in the docs) and is not recommended. If you don't have access to a GPU I would recommend training with one of the non-transformers models first to establish a baseline.

It's hard to say anything about the F score without knowing more about your data. How many samples do you have? How difficult are they to learn?

1 reply

JINHXu Apr 3, 2021
Author

Hi @polm! Thanks for your reply! Actually, I do not have Cuda installed since I am on a MacBook, I think NVIDIA had stopped supporting Cuda on macOS? (I was also wondering how others using a mac device deal with the issue...) You are right, it took really a long time to train a transformer model on CPU.

I used the default configuration downloaded from the spacy website as a baseline, the best F-score during training was 0.5605189083. So an F-score as good as 80.95 in the first epoch seemed odd to me.

My data were German CoNLL-2003, I would not say they were difficult to learn, though the best F-scores obtained in non-transformer models were mostly lower than 0.77, which were much lower than the f-score one gets from the first epoch of a transformer model, that was also what confused me in the first place.

svlandeg · 2021-03-24T14:33:26Z

svlandeg
Mar 24, 2021

has not the F-score improved a bit too much from E0 to E1?

I personally don't think so, no. The E0 is just a fully randomized network. 81% is definitely a high F for E1, but perhaps your entities are very straightforward to learn? Oh I see Paul has hinted towards a similar explanation, so yes it definitely does depend on your data - in your case I'd be quite happy ;-)

(but ofcourse if you don't trust the results, it's always good to take the model, make some predictions on unseen text and eyeball them!)

1 reply

JINHXu Apr 3, 2021
Author

Hi @svlandeg! Thanks for your reply! I just tried on the test data, the NER P was 84.10, which should suggest the model is right :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

German NER transformer: is this possible? 🤔 #7534

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

German NER transformer: is this possible? 🤔 #7534

Uh oh!

JINHXu Mar 22, 2021

Replies: 2 comments · 2 replies

Uh oh!

polm Mar 23, 2021

Uh oh!

JINHXu Apr 3, 2021 Author

Uh oh!

svlandeg Mar 24, 2021

Uh oh!

JINHXu Apr 3, 2021 Author

JINHXu
Mar 22, 2021

Replies: 2 comments 2 replies

polm
Mar 23, 2021

JINHXu Apr 3, 2021
Author

svlandeg
Mar 24, 2021

JINHXu Apr 3, 2021
Author