Spacy loss calculation #9128

baivabdash · 2021-09-03T04:19:17Z

baivabdash
Sep 3, 2021

Hi, I was training a custom spacy NER with a transformer. It had two types of loss i.e. transformer_loss and NER_loss. Can anyone please help me on understanding how each of them is calculated? Also if you can briefly suggest the loss function that would be very much helpful..

Thanking you in advance..

polm · 2021-09-05T07:25:22Z

polm
Sep 5, 2021

For NER it's a little hard to find, but if you look around here you can see that it's using mean squared error of labels to calculate the loss (the mean is taken further down).

The Transformer is like the tok2vec layer in that it doesn't have its own objective - the loss from downstream layers is communicated to the Transformer and reported as its loss, so it's not particularly meaningful if you only have one component. (Also note that while the loss is transmitted to the Transformer, it's not 1-to-1, so you won't see the same values for Transformer loss as your downstream component. This can be caused by the optimizer learning different moments for Transformer parameters, for example.)

3 replies

baivabdash Sep 6, 2021
Author

Thank you for such a quick reply. The function linked in the above answer is for a rehearsal update. On the update function, there is a get_batch_loss function which seems to be using cpu_log_loss for loss calculation. This seems a bit confusing. Should I consider the update or rehearse update and how they differ?
Thanks..

polm Sep 6, 2021

Ah, you're right! I just looked for loss calculations and didn't realize I was in the rehearse function. Sorry about that.

To be clear, I don't have prior familiarity with the loss function here, so I'm looking it up rather than remembering how it works.

It looks like get_batch_loss is indeed the actual loss function, and uses cpu_log_loss for the primary loss calculation. Note that at the end of get_batch_loss the values are squared, but they aren't normalized, because it's not clear what to normalize them by.

rehearse is part of the experimental features for avoiding catastrophic forgetting, and not the normal training process.

baivabdash Sep 7, 2021
Author

Thanks for the reply. Yes looking at functions and understanding seems too cumbersome. One thing I understand is a variant of log_loss is used as a loss function and it is done based on whole entity matching...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Spacy loss calculation #9128

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Spacy loss calculation #9128

Uh oh!

baivabdash Sep 3, 2021

Replies: 1 comment · 3 replies

Uh oh!

polm Sep 5, 2021

Uh oh!

baivabdash Sep 6, 2021 Author

Uh oh!

polm Sep 6, 2021

Uh oh!

baivabdash Sep 7, 2021 Author

baivabdash
Sep 3, 2021

Replies: 1 comment 3 replies

polm
Sep 5, 2021

baivabdash Sep 6, 2021
Author

baivabdash Sep 7, 2021
Author