Fix Olmo3 YaRN RoPE implementation bug #940

gziz · 2026-01-03T21:30:31Z

Summary

Fixes #939
This PR updates the YaRN RoPE (Rotary Position Embedding) implementation in the OLMO 3 standalone notebooks to align with the HuggingFace reference implementation, resolving an issue where text generation would generate gibberish after a few sentences.

Background

The current implementation scales position indices for YaRN like:

# Current approach
if rope_type == "yarn":
    positions = positions / rope_factor
    positions = torch.clamp(positions, max=rope_orig_max - 1)

After reviewing the YaRN paper and the HuggingFace implementation, it appears YaRN should instead scale inverse frequencies, which allows for better handling of extended context lengths.

Proposed Changes

Updated the YaRN algorithm to match the HuggingFace transformers implementation

Frequency-dependent scaling: Different frequencies components are scaled differently
Added beta_fast and beta_slow parameters (32.0 and 1.0 respectively, matching OLMO 3's config)
Linear ramp blending: Smoothly interpolates between:
- High frequencies → unchanged (extrapolation)
- Low frequencies → scaled by rope_factor (interpolation)

# Updated approach
inv_freq_extrapolation = 1.0 / pos_freqs
inv_freq_interpolation = 1.0 / (rope_factor * pos_freqs)

low, high = find_correction_range(beta_fast, beta_slow, dim, theta_base, rope_orig_max)
inv_freq_extrapolation_factor = 1 - linear_ramp_factor(low, high, dim // 2)

inv_freq = (
    inv_freq_interpolation * (1 - inv_freq_extrapolation_factor)
    + inv_freq_extrapolation * inv_freq_extrapolation_factor
)

Key modifications in both notebooks

compute_rope_params function: Updated YaRN logic to use frequency-based scaling
OLMO3_CONFIG_7B: Added beta_fast: 32.0 and beta_slow: 1.0 parameters
OLMO3_CONFIG_32B: Added beta_fast: 32.0 and beta_slow: 1.0 parameters
Olmo3Model.__init__: Updated to pass beta_fast and beta_slow to compute_rope_params

Example Outputs

Prompt: "Tell me about large language models"

Response (current approach):

Large language models 
(LLMs are artificial intelligence models (LLMs) are a type of artificial neural networks trained to learn to predict the next word or next token (word given a sequence of text. They are trained on a text. The are trained on huge dataset of text. The dataset of all the internet texts. They can generate text. They can generate text. They can answer question, code, write code, story, poem, song, code, code, code, code, code, code, code, code, code, essay, code, code, code
...

Response (with proposed fix):

Certainly! Large Language Models (LLMs) are a type of artificial intelligence designed to understand, generate, and respond to human language. Here's an overview:

---

### **What are Large Language Models?**

LLMs are deep learning models trained on vast amounts of text data from books, websites, articles, and other sources. They use neural network architectures—most commonly transformer models (like the one in GPT or BERT)—that allow them to recognize patterns and relationships in language.

---

### **Key Features**

1. **Scale:**  
   - "Large" refers to both the size of the model (in billions or trillions of parameters) and the massive datasets used for training.

... (truncated — response continues coherently)

Testing

Verified coherent text generation with Olmo-3-7B-Instruct
Output quality now matches the official HuggingFace implementation
All 3 tests in the olmo3/tests directory pass as expected

References

Thank you for considering this contribution! Please let me know if you have any questions or would like me to make any adjustments.

review-notebook-app · 2026-01-03T21:30:36Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rasbt · 2026-01-03T23:57:53Z

Thanks a lot for the PR. I added the original yarn configs from HF to my layer debugger tool and you were right, this is necessary to match the HF outputs. It looks all correct to me. Thanks for this great PR!

gziz · 2026-01-04T06:24:57Z

Glad I could help :)
@rasbt

gziz added 2 commits January 3, 2026 13:01

Olmo3 fix RoPE YaRN implementation

4346913

Update cell outputs

66855e0

update olmo layer debugger

d5f7efc

rasbt merged commit 491fd58 into rasbt:main Jan 4, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Olmo3 YaRN RoPE implementation bug #940

Fix Olmo3 YaRN RoPE implementation bug #940

gziz commented Jan 3, 2026 •

edited by rasbt

Loading

Uh oh!

review-notebook-app bot commented Jan 3, 2026

Uh oh!

rasbt commented Jan 3, 2026

Uh oh!

Uh oh!

gziz commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Olmo3 YaRN RoPE implementation bug #940

Fix Olmo3 YaRN RoPE implementation bug #940

Conversation

gziz commented Jan 3, 2026 • edited by rasbt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Proposed Changes

Key modifications in both notebooks

Example Outputs

Testing

References

Uh oh!

review-notebook-app bot commented Jan 3, 2026

Uh oh!

rasbt commented Jan 3, 2026

Uh oh!

Uh oh!

gziz commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gziz commented Jan 3, 2026 •

edited by rasbt

Loading