Skip to content

Conversation

@devpatelio
Copy link
Collaborator

HuggingFace's transformers library has an updated RoPE configuration scheme which removes rope_scaling and rope_theta and replaces it with a single rope_parameters configuration.

We updated the RoPE config by first updating the YAML config to support the updated config template. We then update all trainer utils and calls to now call the updated config. For the vLLM endpoint, we temporarily take our updated YAML config and pass in the RoPE config to behave the same as before (separate rope_scaling and rope_theta) as vLLM isn't updated yet. This will be updated and removed once vLLM is updated. See the vLLM docs for more info.

@devpatelio
Copy link
Collaborator Author

/gemini review

gemini-code-assist[bot]

This comment was marked as resolved.

…efault but allows for override)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
gemini-code-assist[bot]

This comment was marked as resolved.

@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively updates the RoPE configuration to use the new rope_parameters scheme from the HuggingFace transformers library, replacing the deprecated rope_scaling and rope_theta. The changes are consistently applied across documentation, configuration files, and model loading logic. I appreciate the backward-compatible support for vLLM, which is a thoughtful addition.

I've identified a potential issue where a DictConfig object might be passed to vLLM instead of a standard Python dict, which could lead to runtime errors. I've also suggested a couple of minor improvements to make the configuration handling more robust by ensuring OmegaConf interpolations are resolved. Overall, this is a solid update.

@devpatelio devpatelio requested a review from SumanthRH November 25, 2025 03:25
Copy link
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this ?

Can add it to skyrl-train/tests/gpu/gpu_ci/test_model_wrapper.py to test RoPE parameters being set properly.

Also, have you done any E2E test for this? Good way would be to override the skyrl config in a script and confirm that the parameters are being propagated all the way to AutoModel init

SumanthRH and others added 7 commits November 24, 2025 19:55
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@devpatelio
Copy link
Collaborator Author

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

Copy link
Collaborator

@erictang000 erictang000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming you've run something like the basic gsm8k example e2e and everything still works?

other than that lgtm other than a nit

# factor: 1.0
# original_max_position_embeddings: 32768

step_wise_training: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this for now, since the step wise training PR was reverted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, and e2e is working (i tried it on a bunch of different rope configuration runs)

@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively updates the RoPE configuration to align with the new rope_parameters scheme from HuggingFace's transformers, while maintaining backward compatibility for vLLM and older configurations. The updates to documentation, YAML files, and the addition of a dedicated test case are well-executed. However, there is a critical oversight: the RoPE configuration is not applied to the critic model. The get_llm_for_sequence_regression function, used to create the critic, was not updated to handle rope_parameters. This means the critic will be created without the specified RoPE settings, which could negatively impact training. To resolve this, get_llm_for_sequence_regression should be updated to accept and apply rope_parameters to the model's configuration, and the critic worker initializations in deepspeed_worker.py and fsdp_worker.py must be updated to pass this configuration.

Copy link
Collaborator

@erictang000 erictang000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gemini makes a good point, plumb to critic, should be not too much more code?

@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively modernizes the RoPE configuration by adopting the new rope_parameters standard from the transformers library, while thoughtfully maintaining backward compatibility for vLLM. The changes are well-organized, touching documentation, configuration files, and application logic consistently. A new utility function centralizes the migration logic from the deprecated rope_scaling and rope_theta parameters, which is a great approach. The addition of unit tests for the new model wrapper logic is also a valuable contribution. I've identified one potential issue in the new configuration handling logic that could lead to a runtime error with a malformed config. My suggestion addresses this to make the implementation more robust.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively updates the RoPE configuration to align with the newer rope_parameters scheme from HuggingFace's transformers library, deprecating the old rope_scaling and rope_theta fields. The changes are consistently applied across documentation, configuration files, model wrappers, and worker initializations. I appreciate the backward-compatible support for vLLM, which is handled cleanly by reconstructing the old configuration format. The addition of unit tests to verify the new configuration logic is also a great touch. I have a couple of minor suggestions in trainer_utils.py to enhance code clarity and maintainability. Overall, this is a well-executed and thorough update.

Comment on lines +680 to +684
rope_scaling_dict = (
OmegaConf.to_container(rope_scaling, resolve=True)
if isinstance(rope_scaling, DictConfig)
else rope_scaling
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This conditional expression to create rope_scaling_dict can be simplified. OmegaConf.to_container handles non-DictConfig inputs correctly by returning them as-is. You can simplify this to an unconditional call, which also makes it more consistent with how rope_parameters_new is handled later in the function.

            rope_scaling_dict = OmegaConf.to_container(rope_scaling, resolve=True)

Comment on lines +697 to +699
if new_params is not None:
logger.warning(f"Ignoring 'rope_parameters' as it is not a dictionary. Found: {new_params}")
return {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The if new_params is not None: check is redundant. The has_new_config check on line 669 already ensures that rope_parameters_new is not None, and OmegaConf.to_container will not return None for a non-None input. This block can be simplified by removing the conditional check.

        logger.warning(f"Ignoring 'rope_parameters' as it is not a dictionary. Found: {new_params}")
        return {}

@SumanthRH
Copy link
Member

@devpatelio can you resolve conflicts with main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants