Skip to content

Conversation

@casteryh
Copy link
Contributor

@casteryh casteryh commented Sep 19, 2025

This is fixing #190

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025
hf_state_dict = self.engine.checkpointer.sd_adapter.to_hf(flattened_state_dict)
# TODO: Figure out how to gracefully handle which model to-vLLM conversion is needed
vllm_ready_hf_sd = _qwen3_hf_to_vllm(sd=hf_state_dict, num_layers=28)
vllm_ready_hf_sd = _qwen3_hf_to_vllm(sd=hf_state_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be simplified by using num_layers=self.model.config.num_hidden_layers? see this

Ideally you should not be needing this method in the trainer at all. The trainer should be agnostic to the type/arch of generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should work with torchtitan:

num_layers=self.engine.model_args.n_layers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess simply reading the max num layers from regex has the advantage that it's agnostic of the trainer implementation as long as the state dict is in huggingface format.
Let me know what you think. @Ritesh1905 @allenwang28

@casteryh casteryh closed this Sep 23, 2025
@casteryh
Copy link
Contributor Author

closed since the change is already in #215

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants