The benefit of replacing the cross-entropy loss with a loss function that considers the distance between tokens. #210
ChernovAndrey
started this conversation in
Show and tell
Replies: 1 comment
-
Looks promising, thank you for sharing, @ChernovAndrey! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I would like to share my research paper, where I replaced the cross-entropy loss with the Wasserstein loss to provide the model with information about the distance between tokens. Here is the link: https://arxiv.org/abs/2409.15367
Unfortunately, I do not have the resources to train a model from scratch with the Wasserstein loss. Instead, I fine-tuned a model on zero-shot datasets using both the cross-entropy loss and the Wasserstein loss to validate the idea.
If anyone has the resources to train a model from scratch or ideas on how to improve this approach, I would be happy to hear from you and collaborate.
P.S.
The code is publicly available, so feel free to reuse it: https://github.com/ChernovAndrey/chronos-forecasting-wasserstein
Best regards,
Andrei Chernov
Beta Was this translation helpful? Give feedback.
All reactions