Skip to content

Commit ce922dc

Browse files
committed
add credits
1 parent fcb22b4 commit ce922dc

File tree

2 files changed

+150
-9
lines changed

2 files changed

+150
-9
lines changed

candle-transformers/src/models/smol/README.md

Lines changed: 149 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,14 +105,155 @@ cargo run --release --example smollm3 -- \
105105
| F16 (Safe) | 6.2GB | Med | Best | Maximum quality |
106106
| F32 (Safe) | 12GB | Slow | Best | Research/debugging |
107107

108-
## Related Models
108+
# Credits & Attribution
109109

110-
### Granite-Docling
111-
Document understanding VLM that originally used SmolLM-2 but now uses
112-
Granite 165M as its language backbone. See IBM's Docling project.
110+
## SmolLM3 Model
113111

114-
## References
112+
### Developers
113+
**HuggingFace Team (HuggingFaceTB)**
115114

116-
- [SmolLM Blog Post](https://huggingface.co/blog/smollm)
117-
- [SmolLM3 Announcement](https://huggingface.co/blog/smollm3)
118-
- [NoPE Paper](https://arxiv.org/abs/2410.01926) - "Length Generalization of Causal Transformers without Position Encoding"
115+
The SmolLM family of models represents cutting-edge work in efficient language models, demonstrating that small models can achieve impressive capabilities when trained on high-quality data.
116+
117+
### Resources
118+
- **Model Card**: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
119+
- **Model Card (Base)**: https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base
120+
- **Collection**: https://huggingface.co/collections/HuggingFaceTB/smollm3-6723884a9c35673e4f9b74a2
121+
- **Blog Post**: https://huggingface.co/blog/smollm3
122+
- **GitHub Repository**: https://github.com/huggingface/smollm
123+
- **License**: Apache 2.0
124+
125+
### Key Contributors
126+
The SmolLM project is developed by the HuggingFace team with contributions from researchers focused on efficient LLM architectures and training methods.
127+
128+
## NoPE Architecture
129+
130+
### Research Paper
131+
**Title**: "Length Generalization of Causal Transformers without Position Encoding"
132+
133+
**Authors**:
134+
- Jie Wang (Fudan University)
135+
- Tao Ji (Fudan University)
136+
- Yuanbin Wu (Fudan University)
137+
- Hang Yan (Fudan University)
138+
- Tao Gui (Fudan University)
139+
- Qi Zhang (Fudan University)
140+
- Xuanjing Huang (Fudan University)
141+
- Xiaoling Wang (Fudan University)
142+
143+
**Published**: NeurIPS 2024 (Thirty-Eighth Annual Conference on Neural Information Processing Systems)
144+
145+
**Abstract Summary**: The paper demonstrates that removing positional encoding from selected layers (NoPE - No Positional Encoding) can improve length generalization in causal transformers while maintaining or improving performance. SmolLM3 implements this with a 3:1 RoPE/NoPE ratio.
146+
147+
**Resources**:
148+
- **arXiv**: https://arxiv.org/abs/2410.01926
149+
- **Conference**: NeurIPS 2024
150+
151+
### Key Innovation
152+
The hybrid approach uses:
153+
- **RoPE layers** (75%): Standard rotary positional embeddings for local context
154+
- **NoPE layers** (25%): No positional encoding for improved length generalization
155+
- **Pattern**: Every 4th layer uses NoPE (layers 3, 7, 11, 15, etc.)
156+
157+
This architecture enables SmolLM3 to handle much longer contexts (64k-128k tokens) while maintaining efficiency.
158+
159+
## Quantized Models
160+
161+
### Unsloth
162+
Quantized GGUF models are provided by **Unsloth**, a team focused on making LLM inference and fine-tuning more accessible.
163+
164+
**Resources**:
165+
- **GGUF Repository**: https://huggingface.co/unsloth/SmolLM3-3B-GGUF
166+
- **Available Quantizations**: Q4_K_M, Q8_0, F16
167+
- **Website**: https://unsloth.ai/
168+
169+
The quantization work enables running SmolLM3 efficiently on consumer hardware with minimal quality loss.
170+
171+
## Implementation Credits
172+
173+
### This Candle Implementation
174+
**Implemented for**: Candle ML Framework
175+
**Implementation Date**: Nov 2025
176+
**Features**:
177+
- Full precision model (F32/F16/BF16)
178+
- Quantized model (Q4_K_M/Q8_0/F16 GGUF)
179+
- Unified example supporting both
180+
- Verified against reference implementations
181+
182+
**Verification**:
183+
- Full precision: Validated against HuggingFace Transformers Python implementation
184+
- Quantized: Validated against llama.cpp implementation
185+
186+
### Related Tools & Frameworks
187+
188+
**Candle**: Minimalist ML framework in Rust by HuggingFace
189+
- GitHub: https://github.com/huggingface/candle
190+
191+
**llama.cpp**: Efficient LLM inference in C/C++
192+
- GitHub: https://github.com/ggerganov/llama.cpp
193+
- Used for quantized model verification
194+
195+
**HuggingFace Transformers**: Reference Python implementation
196+
- GitHub: https://github.com/huggingface/transformers
197+
- Used for full model verification
198+
199+
## Acknowledgments
200+
201+
Special thanks to:
202+
203+
1. **HuggingFace Team** - For developing SmolLM3 and making it openly available under Apache 2.0 license
204+
2. **NoPE Researchers** - For advancing the field with novel positional encoding approaches
205+
3. **Unsloth** - For providing optimized quantized versions
206+
4. **Candle Contributors** - For building an excellent ML framework in Rust
207+
5. **Open Source Community** - For tools like llama.cpp that enable verification and benchmarking
208+
209+
## Citation
210+
211+
If you use SmolLM3 in your research or applications, please cite:
212+
213+
### SmolLM3 Model
214+
```bibtex
215+
@misc{smollm3,
216+
title={SmolLM3},
217+
author={HuggingFace Team},
218+
year={2024},
219+
publisher={HuggingFace},
220+
howpublished={\url{https://huggingface.co/HuggingFaceTB/SmolLM3-3B}}
221+
}
222+
```
223+
224+
### NoPE Paper
225+
```bibtex
226+
@inproceedings{wang2024length,
227+
title={Length Generalization of Causal Transformers without Position Encoding},
228+
author={Wang, Jie and Ji, Tao and Wu, Yuanbin and Yan, Hang and Gui, Tao and Zhang, Qi and Huang, Xuanjing and Wang, Xiaoling},
229+
booktitle={Thirty-Eighth Annual Conference on Neural Information Processing Systems},
230+
year={2024}
231+
}
232+
```
233+
234+
### Candle Framework
235+
```bibtex
236+
@software{candle,
237+
title={Candle: Minimalist ML Framework},
238+
author={HuggingFace},
239+
year={2024},
240+
url={https://github.com/huggingface/candle}
241+
}
242+
```
243+
244+
## License
245+
246+
- **SmolLM3 Model**: Apache 2.0
247+
- **This Implementation**: Follows Candle framework license
248+
- **Candle Framework**: Apache 2.0 and MIT dual-licensed
249+
250+
## Further Reading
251+
252+
- **SmolLM Blog Series**: https://huggingface.co/blog/smollm and https://huggingface.co/blog/smollm3
253+
- **Model Card Details**: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
254+
- **NoPE Paper**: https://arxiv.org/abs/2410.01926
255+
- **Candle Documentation**: https://huggingface.github.io/candle/
256+
257+
---
258+
259+
This implementation stands on the shoulders of giants. Thank you to all the researchers, engineers, and open source contributors who make this work possible.

candle-transformers/src/models/smol/smollm3.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -463,7 +463,7 @@ impl ModelForCausalLM {
463463
.narrow(1, l - 1, 1)?
464464
.apply(&self.lm_head)
465465
}
466-
466+
467467
pub fn clear_kv_cache(&mut self) {
468468
self.base.clear_kv_cache();
469469
}

0 commit comments

Comments
 (0)