-
Notifications
You must be signed in to change notification settings - Fork 214
[update] Updated RoPE Configuration for HF Models (transformers) w. backward-compatible support for vLLM #690 #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[update] Updated RoPE Configuration for HF Models (transformers) w. backward-compatible support for vLLM #690 #703
Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
…efault but allows for override) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively updates the RoPE configuration to use the new rope_parameters scheme from the HuggingFace transformers library, replacing the deprecated rope_scaling and rope_theta. The changes are consistently applied across documentation, configuration files, and model loading logic. I appreciate the backward-compatible support for vLLM, which is a thoughtful addition.
I've identified a potential issue where a DictConfig object might be passed to vLLM instead of a standard Python dict, which could lead to runtime errors. I've also suggested a couple of minor improvements to make the configuration handling more robust by ensuring OmegaConf interpolations are resolved. Overall, this is a solid update.
SumanthRH
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this ?
Can add it to skyrl-train/tests/gpu/gpu_ci/test_model_wrapper.py to test RoPE parameters being set properly.
Also, have you done any E2E test for this? Good way would be to override the skyrl config in a script and confirm that the parameters are being propagated all the way to AutoModel init
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
erictang000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming you've run something like the basic gsm8k example e2e and everything still works?
other than that lgtm other than a nit
| # factor: 1.0 | ||
| # original_max_position_embeddings: 32768 | ||
|
|
||
| step_wise_training: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this for now, since the step wise training PR was reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, and e2e is working (i tried it on a bunch of different rope configuration runs)
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively updates the RoPE configuration to align with the new rope_parameters scheme from HuggingFace's transformers, while maintaining backward compatibility for vLLM and older configurations. The updates to documentation, YAML files, and the addition of a dedicated test case are well-executed. However, there is a critical oversight: the RoPE configuration is not applied to the critic model. The get_llm_for_sequence_regression function, used to create the critic, was not updated to handle rope_parameters. This means the critic will be created without the specified RoPE settings, which could negatively impact training. To resolve this, get_llm_for_sequence_regression should be updated to accept and apply rope_parameters to the model's configuration, and the critic worker initializations in deepspeed_worker.py and fsdp_worker.py must be updated to pass this configuration.
erictang000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gemini makes a good point, plumb to critic, should be not too much more code?
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively modernizes the RoPE configuration by adopting the new rope_parameters standard from the transformers library, while thoughtfully maintaining backward compatibility for vLLM. The changes are well-organized, touching documentation, configuration files, and application logic consistently. A new utility function centralizes the migration logic from the deprecated rope_scaling and rope_theta parameters, which is a great approach. The addition of unit tests for the new model wrapper logic is also a valuable contribution. I've identified one potential issue in the new configuration handling logic that could lead to a runtime error with a malformed config. My suggestion addresses this to make the implementation more robust.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively updates the RoPE configuration to align with the newer rope_parameters scheme from HuggingFace's transformers library, deprecating the old rope_scaling and rope_theta fields. The changes are consistently applied across documentation, configuration files, model wrappers, and worker initializations. I appreciate the backward-compatible support for vLLM, which is handled cleanly by reconstructing the old configuration format. The addition of unit tests to verify the new configuration logic is also a great touch. I have a couple of minor suggestions in trainer_utils.py to enhance code clarity and maintainability. Overall, this is a well-executed and thorough update.
| rope_scaling_dict = ( | ||
| OmegaConf.to_container(rope_scaling, resolve=True) | ||
| if isinstance(rope_scaling, DictConfig) | ||
| else rope_scaling | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional expression to create rope_scaling_dict can be simplified. OmegaConf.to_container handles non-DictConfig inputs correctly by returning them as-is. You can simplify this to an unconditional call, which also makes it more consistent with how rope_parameters_new is handled later in the function.
rope_scaling_dict = OmegaConf.to_container(rope_scaling, resolve=True)| if new_params is not None: | ||
| logger.warning(f"Ignoring 'rope_parameters' as it is not a dictionary. Found: {new_params}") | ||
| return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if new_params is not None: check is redundant. The has_new_config check on line 669 already ensures that rope_parameters_new is not None, and OmegaConf.to_container will not return None for a non-None input. This block can be simplified by removing the conditional check.
logger.warning(f"Ignoring 'rope_parameters' as it is not a dictionary. Found: {new_params}")
return {}|
@devpatelio can you resolve conflicts with main? |
HuggingFace's transformers library has an updated RoPE configuration scheme which removes rope_scaling and rope_theta and replaces it with a single rope_parameters configuration.
We updated the RoPE config by first updating the YAML config to support the updated config template. We then update all trainer utils and calls to now call the updated config. For the vLLM endpoint, we temporarily take our updated YAML config and pass in the RoPE config to behave the same as before (separate rope_scaling and rope_theta) as vLLM isn't updated yet. This will be updated and removed once vLLM is updated. See the vLLM docs for more info.