Skip to content

Commit df2aa87

Browse files
authored
doc: Ease documentation & navigation (#72)
* doc: Ease documentation & navigation * doc: resolve comment
1 parent aa363aa commit df2aa87

File tree

9 files changed

+282
-753
lines changed

9 files changed

+282
-753
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Run Pytest
1+
name: Pytest
22

33
on:
44
push:

README.md

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@
33

44
# Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models
55

6-
[![Paper](https://img.shields.io/badge/Paper-ICML-darkgreen.svg)](https://arxiv.org/abs/2411.02083)
7-
[![Landing](https://img.shields.io/badge/GitHub-Pages-blue.svg)](https://tum-ai.github.io/number-token-loss/)
8-
[![Demo](https://img.shields.io/badge/🤗-Demo-yellow.svg)](https://huggingface.co/spaces/jannisborn/NumberTokenLoss)
9-
[![Integration](https://img.shields.io/badge/💻-Integration_Example-purple.svg)](scripts/loss_integration.ipynb)
6+
[![Paper](https://img.shields.io/badge/Paper-ICML-darkgreen.svg)](https://ibm.biz/ntl-paper)
7+
[![Landing](https://img.shields.io/badge/Landing-Page-blue.svg)](https://ibm.biz/ntl-main)
8+
[![Demo](https://img.shields.io/badge/🤗-Demo-yellow.svg)](https://ibm.biz/ntl-demo)
9+
[![YouTube](https://img.shields.io/badge/YouTube-Talk-red?logo=youtube)](https://ibm.biz/ntl-5min-yt)
1010
[![CI](https://github.com/tum-ai/number-token-loss/actions/workflows/ci.yml/badge.svg)](https://github.com/tum-ai/number-token-loss/actions/workflows/ci.yml)
1111
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
12-
[![PyPI](https://badge.fury.io/py/ntloss.svg)](https://badge.fury.io/py/ntloss)
12+
[![PyPI](https://img.shields.io/pypi/v/ntloss?label=pypi&color=brightgreen)](https://pypi.org/project/ntloss/)
1313
[![Downloads](https://static.pepy.tech/badge/ntloss)](https://pepy.tech/project/ntloss)
1414

1515

@@ -22,11 +22,17 @@ Achieves better performance on math tasks without computational overhead 🚀*
2222

2323
---
2424

25+
## 🧭 Navigation
26+
### This repo contains the *full development code* for our [ICML 2025 paper](https://ibm.biz/ntl-paper). It is functional but a bit heavy so we will archive it soon!
27+
### To use NTL in your own project, we highlighy recommend to use the [PyPI version](https://pypi.org/project/ntloss/) which is maintained [separately here](https://ibm.biz/ntl-pypi-repo).
28+
2529
## 📖 Overview
2630

2731
**Number Token Loss (NTL)** introduces a novel approach to enhance language models' numerical reasoning capabilities. Unlike traditional cross-entropy loss that treats all incorrect predictions equally, NTL incorporates the numerical proximity of tokens, providing regression-like behavior at the token level.
2832

29-
![NTL Concept](resources%2Fntl-image.jpg)
33+
<div align="center">
34+
<img src="resources/ntl-image.jpg" alt="NTL Concept" width="75%">
35+
</div>
3036

3137

3238
## 🎯 Why do we need the Number Token Loss (NTL)?
@@ -49,7 +55,10 @@ Achieves better performance on math tasks without computational overhead 🚀*
4955
<strong>NTL-MSE</strong> – Dot-product expectation of numeric value with squared error (most intuitive but has some undesired local minima)
5056
</p>
5157

52-
![Loss Comparison](docs/assets/loss_comparison_v4.svg)
58+
<div align="center">
59+
<img src="docs/assets/loss_comparison_v4.svg" alt="Loss Comparison" width="50%">
60+
</div>
61+
5362

5463
## 🔑 Key Features
5564

@@ -58,18 +67,19 @@ Achieves better performance on math tasks without computational overhead 🚀*
5867
-**No computational overhead**: NTL adds only ~1% compute time to <emph>loss calculation</emph> which is negligible over a full training step.
5968
- 📈 **Consistently improves performance**: NTL outperforms plain cross entropy across multiple architectures and math benchmarks.
6069
- 🔢 **Performs true regression**: On regression tasks a LM head with NTL matches a dedicated regression head.
61-
- 🚀 **Scales to large models**: Even <a href="https://huggingface.co/ibm-granite/granite-3.2-2b-instruct">Granite 3.2 2B</a> and <a href="https://huggingface.co/google-t5/t5-3b">T5-3B</a> benefit heavily from NTL on math tasks like GSM8K.
70+
- 🚀 **Scales to large models**: <a href="https://huggingface.co/ibm-granite/granite-3.2-2b-instruct">Granite 3.2 2B</a> and <a href="https://huggingface.co/google-t5/t5-3b">T5-3B</a> benefit heavily from NTL on math tasks like GSM8K.
6271

6372

6473

6574
## 🚀 Quick Links
6675

67-
- 📄 **Paper**: [Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models](https://arxiv.org/abs/2411.02083)
68-
- 🌐 **Project Page**: [https://tum-ai.github.io/number-token-loss/](https://tum-ai.github.io/number-token-loss/)
69-
- 🎮 **Interactive Demo**: [https://huggingface.co/spaces/jannisborn/NumberTokenLoss](https://huggingface.co/spaces/jannisborn/NumberTokenLoss)
76+
- 💻 **Use NTL**: Stable, maintained & lightweight implementation available as `ntloss` from [PyPI](https://pypi.org/project/ntloss/). Codebase available [separately here](https://ibm.biz/ntl-pypi-repo).
77+
- 📄 **Paper**: [Regress, Don't Guess – A Regression-like Loss on Number Tokens for Language Models](https://ibm.biz/ntl-paper)
78+
- 🌐 **Project Page**: [https://tum-ai.github.io/number-token-loss/](https://ibm.biz/ntl-main)
79+
- 📺 **5 min YouTube Talk**: [Talk about the ICML paper](https://ibm.biz/ntl-5min-yt)
80+
- 🎮 **Interactive Demo**: [https://huggingface.co/spaces/jannisborn/NumberTokenLoss](https://ibm.biz/ntl-demo)
7081
- 📋 **NeurIPS 2024 MathAI Workshop Poster**: [View Poster](https://github.com/tum-ai/number-token-loss/blob/main/resources/neurips_mathai_poster.pdf)
71-
- 💻 **Tutorial**: [loss_integration.ipynb](scripts/loss_integration.ipynb) - Easy integration into your own models
72-
- 💻 **PyPI**: Fetch `ntloss` from [PyPI](https://pypi.org/project/ntloss/)
82+
7383

7484
## 🏃‍♂️ Quick Start
7585

@@ -267,7 +277,7 @@ If you find this work useful, please cite our paper:
267277
and Vishwa Mohan Singh and Michael Danziger and Jannis Born},
268278
booktitle = {Proc. of the 42nd International Conference on Machine Learning (ICML)},
269279
year = {2025},
270-
url = {https://tum-ai.github.io/number-token-loss/}
280+
url = {https://ibm.biz/ntl-main}
271281
}
272282
```
273283

@@ -276,13 +286,12 @@ If you find this work useful, please cite our paper:
276286
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
277287

278288
<!-- ## 🙏 Acknowledgments
279-
280-
This work was supported by TUM.ai, Technical University of Munich, and IBM Research Europe. Special thanks to the NeurIPS 2024 MathAI Workshop for featuring our research.
289+
This work was supported and conducted by TUM.ai & Technical University of Munich and led by IBM Research Europe.
281290
-->
282291
---
283292

284293
<div align="center">
285294

286-
**[🌐 Project Website](https://tum-ai.github.io/number-token-loss/) | [📄 Paper](https://arxiv.org/abs/2411.02083) | [🎮 Demo](https://huggingface.co/spaces/jannisborn/NumberTokenLoss) | [💻 Integration Example](scripts/loss_integration.ipynb)**
295+
**[🌐 Project Website](https://ibm.biz/ntl-main) | [📄 Paper](https://ibm.biz/ntl-paper) | [🎮 Demo](https://ibm.biz/ntl-demo) | [💻 Use NTL](https://ibm.biz/ntl-pypi-repo)**
287296

288297
</div>

0 commit comments

Comments
 (0)