You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[NoPE Paper](https://arxiv.org/abs/2410.01926) - "Length Generalization of Causal Transformers without Position Encoding"
115
+
The SmolLM family of models represents cutting-edge work in efficient language models, demonstrating that small models can achieve impressive capabilities when trained on high-quality data.
The SmolLM project is developed by the HuggingFace team with contributions from researchers focused on efficient LLM architectures and training methods.
127
+
128
+
## NoPE Architecture
129
+
130
+
### Research Paper
131
+
**Title**: "Length Generalization of Causal Transformers without Position Encoding"
132
+
133
+
**Authors**:
134
+
- Jie Wang (Fudan University)
135
+
- Tao Ji (Fudan University)
136
+
- Yuanbin Wu (Fudan University)
137
+
- Hang Yan (Fudan University)
138
+
- Tao Gui (Fudan University)
139
+
- Qi Zhang (Fudan University)
140
+
- Xuanjing Huang (Fudan University)
141
+
- Xiaoling Wang (Fudan University)
142
+
143
+
**Published**: NeurIPS 2024 (Thirty-Eighth Annual Conference on Neural Information Processing Systems)
144
+
145
+
**Abstract Summary**: The paper demonstrates that removing positional encoding from selected layers (NoPE - No Positional Encoding) can improve length generalization in causal transformers while maintaining or improving performance. SmolLM3 implements this with a 3:1 RoPE/NoPE ratio.
146
+
147
+
**Resources**:
148
+
-**arXiv**: https://arxiv.org/abs/2410.01926
149
+
-**Conference**: NeurIPS 2024
150
+
151
+
### Key Innovation
152
+
The hybrid approach uses:
153
+
-**RoPE layers** (75%): Standard rotary positional embeddings for local context
154
+
-**NoPE layers** (25%): No positional encoding for improved length generalization
This implementation stands on the shoulders of giants. Thank you to all the researchers, engineers, and open source contributors who make this work possible.
0 commit comments