add credits

DrJesseGlass · DrJesseGlass · commit ce922dcb5e4d · 2025-12-05T16:35:29.000-05:00
diff --git a/candle-transformers/src/models/smol/README.md b/candle-transformers/src/models/smol/README.md
@@ -105,14 +105,155 @@ cargo run --release --example smollm3 -- \
 | F16 (Safe) | 6.2GB | Med   | Best    | Maximum quality |
 | F32 (Safe) | 12GB  | Slow  | Best    | Research/debugging |
 
-## Related Models
+# Credits & Attribution
 
-### Granite-Docling
-Document understanding VLM that originally used SmolLM-2 but now uses 
-Granite 165M as its language backbone. See IBM's Docling project.
+## SmolLM3 Model
 
-## References
+### Developers
+**HuggingFace Team (HuggingFaceTB)**
 
-- [SmolLM Blog Post](https://huggingface.co/blog/smollm)
-- [SmolLM3 Announcement](https://huggingface.co/blog/smollm3)
-- [NoPE Paper](https://arxiv.org/abs/2410.01926) - "Length Generalization of Causal Transformers without Position Encoding"
+The SmolLM family of models represents cutting-edge work in efficient language models, demonstrating that small models can achieve impressive capabilities when trained on high-quality data.
+
+### Resources
+- **Model Card**: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
+- **Model Card (Base)**: https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base
+- **Collection**: https://huggingface.co/collections/HuggingFaceTB/smollm3-6723884a9c35673e4f9b74a2
+- **Blog Post**: https://huggingface.co/blog/smollm3
+- **GitHub Repository**: https://github.com/huggingface/smollm
+- **License**: Apache 2.0
+
+### Key Contributors
+The SmolLM project is developed by the HuggingFace team with contributions from researchers focused on efficient LLM architectures and training methods.
+
+## NoPE Architecture
+
+### Research Paper
+**Title**: "Length Generalization of Causal Transformers without Position Encoding"
+
+**Authors**: 
+- Jie Wang (Fudan University)
+- Tao Ji (Fudan University)
+- Yuanbin Wu (Fudan University)
+- Hang Yan (Fudan University)
+- Tao Gui (Fudan University)
+- Qi Zhang (Fudan University)
+- Xuanjing Huang (Fudan University)
+- Xiaoling Wang (Fudan University)
+
+**Published**: NeurIPS 2024 (Thirty-Eighth Annual Conference on Neural Information Processing Systems)
+
+**Abstract Summary**: The paper demonstrates that removing positional encoding from selected layers (NoPE - No Positional Encoding) can improve length generalization in causal transformers while maintaining or improving performance. SmolLM3 implements this with a 3:1 RoPE/NoPE ratio.
+
+**Resources**:
+- **arXiv**: https://arxiv.org/abs/2410.01926
+- **Conference**: NeurIPS 2024
+
+### Key Innovation
+The hybrid approach uses:
+- **RoPE layers** (75%): Standard rotary positional embeddings for local context
+- **NoPE layers** (25%): No positional encoding for improved length generalization
+- **Pattern**: Every 4th layer uses NoPE (layers 3, 7, 11, 15, etc.)
+
+This architecture enables SmolLM3 to handle much longer contexts (64k-128k tokens) while maintaining efficiency.
+
+## Quantized Models
+
+### Unsloth
+Quantized GGUF models are provided by **Unsloth**, a team focused on making LLM inference and fine-tuning more accessible.
+
+**Resources**:
+- **GGUF Repository**: https://huggingface.co/unsloth/SmolLM3-3B-GGUF
+- **Available Quantizations**: Q4_K_M, Q8_0, F16
+- **Website**: https://unsloth.ai/
+
+The quantization work enables running SmolLM3 efficiently on consumer hardware with minimal quality loss.
+
+## Implementation Credits
+
+### This Candle Implementation
+**Implemented for**: Candle ML Framework  
+**Implementation Date**: Nov 2025  
+**Features**:
+- Full precision model (F32/F16/BF16)
+- Quantized model (Q4_K_M/Q8_0/F16 GGUF)
+- Unified example supporting both
+- Verified against reference implementations
+
+**Verification**:
+- Full precision: Validated against HuggingFace Transformers Python implementation
+- Quantized: Validated against llama.cpp implementation
+
+### Related Tools & Frameworks
+
+**Candle**: Minimalist ML framework in Rust by HuggingFace  
+- GitHub: https://github.com/huggingface/candle
+
+**llama.cpp**: Efficient LLM inference in C/C++  
+- GitHub: https://github.com/ggerganov/llama.cpp
+- Used for quantized model verification
+
+**HuggingFace Transformers**: Reference Python implementation  
+- GitHub: https://github.com/huggingface/transformers
+- Used for full model verification
+
+## Acknowledgments
+
+Special thanks to:
+
+1. **HuggingFace Team** - For developing SmolLM3 and making it openly available under Apache 2.0 license
+2. **NoPE Researchers** - For advancing the field with novel positional encoding approaches
+3. **Unsloth** - For providing optimized quantized versions
+4. **Candle Contributors** - For building an excellent ML framework in Rust
+5. **Open Source Community** - For tools like llama.cpp that enable verification and benchmarking
+
+## Citation
+
+If you use SmolLM3 in your research or applications, please cite:
+
+### SmolLM3 Model
+```bibtex
+@misc{smollm3,
+  title={SmolLM3},
+  author={HuggingFace Team},
+  year={2024},
+  publisher={HuggingFace},
+  howpublished={\url{https://huggingface.co/HuggingFaceTB/SmolLM3-3B}}
+}
+```
+
+### NoPE Paper
+```bibtex
+@inproceedings{wang2024length,
+  title={Length Generalization of Causal Transformers without Position Encoding},
+  author={Wang, Jie and Ji, Tao and Wu, Yuanbin and Yan, Hang and Gui, Tao and Zhang, Qi and Huang, Xuanjing and Wang, Xiaoling},
+  booktitle={Thirty-Eighth Annual Conference on Neural Information Processing Systems},
+  year={2024}
+}
+```
+
+### Candle Framework
+```bibtex
+@software{candle,
+  title={Candle: Minimalist ML Framework},
+  author={HuggingFace},
+  year={2024},
+  url={https://github.com/huggingface/candle}
+}
+```
+
+## License
+
+- **SmolLM3 Model**: Apache 2.0
+- **This Implementation**: Follows Candle framework license
+- **Candle Framework**: Apache 2.0 and MIT dual-licensed
+
+## Further Reading
+
+- **SmolLM Blog Series**: https://huggingface.co/blog/smollm and https://huggingface.co/blog/smollm3
+- **Model Card Details**: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
+- **NoPE Paper**: https://arxiv.org/abs/2410.01926
+- **Candle Documentation**: https://huggingface.github.io/candle/
+
+---
+
+This implementation stands on the shoulders of giants. Thank you to all the researchers, engineers, and open source contributors who make this work possible.
diff --git a/candle-transformers/src/models/smol/smollm3.rs b/candle-transformers/src/models/smol/smollm3.rs
@@ -463,7 +463,7 @@ impl ModelForCausalLM {
             .narrow(1, l - 1, 1)?
             .apply(&self.lm_head)
     }
-    
+
     pub fn clear_kv_cache(&mut self) {
         self.base.clear_kv_cache();
     }

Original file line number	Diff line number	Diff line change
`@@ -463,7 +463,7 @@ impl ModelForCausalLM {`
`463`	`463`	`.narrow(1, l - 1, 1)?`
`464`	`464`	`.apply(&self.lm_head)`
`465`	`465`	`}`
`466`		`-`
	`466`	`+`
`467`	`467`	`pub fn clear_kv_cache(&mut self) {`
`468`	`468`	`self.base.clear_kv_cache();`
`469`	`469`	`}`