Fix Olmo3 YaRN RoPE implementation bug #940
Merged
+157
−42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #939
This PR updates the YaRN RoPE (Rotary Position Embedding) implementation in the OLMO 3 standalone notebooks to align with the HuggingFace reference implementation, resolving an issue where text generation would generate gibberish after a few sentences.
Background
The current implementation scales position indices for YaRN like:
After reviewing the YaRN paper and the HuggingFace implementation, it appears YaRN should instead scale inverse frequencies, which allows for better handling of extended context lengths.
Proposed Changes
Updated the YaRN algorithm to match the HuggingFace transformers implementation
beta_fastandbeta_slowparameters (32.0 and 1.0 respectively, matching OLMO 3's config)rope_factor(interpolation)Key modifications in both notebooks
compute_rope_paramsfunction: Updated YaRN logic to use frequency-based scalingOLMO3_CONFIG_7B: Addedbeta_fast: 32.0andbeta_slow: 1.0parametersOLMO3_CONFIG_32B: Addedbeta_fast: 32.0andbeta_slow: 1.0parametersOlmo3Model.__init__: Updated to passbeta_fastandbeta_slowtocompute_rope_paramsExample Outputs
Prompt:
"Tell me about large language models"Response (current approach):
Response (with proposed fix):
Testing
Olmo-3-7B-InstructReferences
Thank you for considering this contribution! Please let me know if you have any questions or would like me to make any adjustments.