You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -22,11 +22,17 @@ Achieves better performance on math tasks without computational overhead 🚀*
22
22
23
23
---
24
24
25
+
## 🧭 Navigation
26
+
### This repo contains the *full development code* for our [ICML 2025 paper](https://ibm.biz/ntl-paper). It is functional but a bit heavy so we will archive it soon!
27
+
### To use NTL in your own project, we highlighy recommend to use the [PyPI version](https://pypi.org/project/ntloss/) which is maintained [separately here](https://ibm.biz/ntl-pypi-repo).
28
+
25
29
## 📖 Overview
26
30
27
31
**Number Token Loss (NTL)** introduces a novel approach to enhance language models' numerical reasoning capabilities. Unlike traditional cross-entropy loss that treats all incorrect predictions equally, NTL incorporates the numerical proximity of tokens, providing regression-like behavior at the token level.
@@ -58,18 +67,19 @@ Achieves better performance on math tasks without computational overhead 🚀*
58
67
- ⚡ **No computational overhead**: NTL adds only ~1% compute time to <emph>loss calculation</emph> which is negligible over a full training step.
59
68
- 📈 **Consistently improves performance**: NTL outperforms plain cross entropy across multiple architectures and math benchmarks.
60
69
- 🔢 **Performs true regression**: On regression tasks a LM head with NTL matches a dedicated regression head.
61
-
- 🚀 **Scales to large models**: Even <ahref="https://huggingface.co/ibm-granite/granite-3.2-2b-instruct">Granite 3.2 2B</a> and <ahref="https://huggingface.co/google-t5/t5-3b">T5-3B</a> benefit heavily from NTL on math tasks like GSM8K.
70
+
- 🚀 **Scales to large models**: <ahref="https://huggingface.co/ibm-granite/granite-3.2-2b-instruct">Granite 3.2 2B</a> and <ahref="https://huggingface.co/google-t5/t5-3b">T5-3B</a> benefit heavily from NTL on math tasks like GSM8K.
62
71
63
72
64
73
65
74
## 🚀 Quick Links
66
75
67
-
- 📄 **Paper**: [Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models](https://arxiv.org/abs/2411.02083)
- 💻 **Use NTL**: Stable, maintained & lightweight implementation available as `ntloss` from [PyPI](https://pypi.org/project/ntloss/). Codebase available [separately here](https://ibm.biz/ntl-pypi-repo).
77
+
- 📄 **Paper**: [Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models](https://ibm.biz/ntl-paper)
@@ -276,13 +286,12 @@ If you find this work useful, please cite our paper:
276
286
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
277
287
278
288
<!-- ## 🙏 Acknowledgments
279
-
280
-
This work was supported by TUM.ai, Technical University of Munich, and IBM Research Europe. Special thanks to the NeurIPS 2024 MathAI Workshop for featuring our research.
289
+
This work was supported and conducted by TUM.ai & Technical University of Munich and led by IBM Research Europe.
0 commit comments