-
Notifications
You must be signed in to change notification settings - Fork 214
Added Convert Deepspeed to Huggingface Safetensors script #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a useful script for converting DeepSpeed checkpoints to the Hugging Face Safetensors format. The script is well-structured and includes helpful features like validation. I've found a critical bug in argument parsing that would cause a crash, a security vulnerability with subprocess, and several areas for improvement regarding robustness, code clarity, and adherence to best practices. My comments include suggestions to fix these issues.
| ROOT = deepspeed_model_path | ||
| POLICY_DIR = ROOT / "policy" | ||
| HF_BASE = POLICY_DIR / "huggingface" | ||
| OUT_DIR = POLICY_DIR / "huggingface_converted" if not out_dir else out_dir | ||
| MERGED_FP32 = OUT_DIR / "merged_model" # directory that will store the ultimate pytorch weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to PEP 8, constants are named in all capital letters with underscores. These variables (ROOT, POLICY_DIR, etc.) are not true constants as their values are derived from function arguments. They should be named using snake_case (e.g., root, policy_dir) for better readability and to follow standard Python conventions. This would require updating their usage throughout the function.
| merged_bin = MERGED_FP32 / "pytorch_model.bin" | ||
| hf_model_bin = HF_BASE / "pytorch_model.bin" | ||
| shutil.copy2(merged_bin, hf_model_bin) | ||
| print(f" Copied to: {hf_model_bin}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying the merged model binary (pytorch_model.bin) into the HF_BASE directory is problematic. It modifies an input directory, which is a side effect that should be avoided. This copy is also redundant because you are already loading the state dictionary from merged_bin and then explicitly loading it into the model with model.load_state_dict(state, strict=False). The from_pretrained call will initialize a model from the config (with random weights if no checkpoint is found), and load_state_dict will then correctly populate its weights.
|
|
||
| # === 3. Load HF config and initialize model === | ||
| print("[3/5] Initializing Hugging Face model ...") | ||
| model = AutoModelForCausalLM.from_pretrained(HF_BASE, torch_dtype=torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script hardcodes AutoModelForCausalLM when initializing the model. You have a helper function guess_hf_class that can determine the correct model class from the configuration, which would make the script more robust and applicable to a wider range of models (e.g., sequence-to-sequence). This should be used here, and also in the final summary message on line 107. I've also added trust_remote_code=True for consistency with the validate_load function, as it's often required for custom models.
| model = AutoModelForCausalLM.from_pretrained(HF_BASE, torch_dtype=torch.float16) | |
| cfg = AutoConfig.from_pretrained(HF_BASE, trust_remote_code=True) | |
| HFClass = guess_hf_class(cfg) | |
| model = HFClass.from_pretrained(HF_BASE, torch_dtype=torch.float16, trust_remote_code=True) |
| return Path(OUT_DIR) | ||
|
|
||
|
|
||
| def guess_hf_class(cfg: AutoConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit excessive, we only support training decoder only AutoModelForCausalLM archs right now
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| # === Directories === | ||
| def main(deepspeed_model_path: Path, out_dir: Path = None) -> Path: | ||
| ROOT = deepspeed_model_path | ||
| POLICY_DIR = ROOT / "policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran with the Deepspeed backend, my checkpoint dir didn't have a policy folder. I'd honestly prefer to remove that and assume a structure like:
some_deepspeed_checkpoint/
├── latest
├── global_step123/
│ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt
│ ├── zero_pp_rank_0_mp_rank_00_model_states.pt
│ ├── ...
├── global_step124/
│ ├── zero_pp_rank_0_mp_rank_00_optim_states.pt
│ ├── zero_pp_rank_0_mp_rank_00_model_states.pt
│ ├── ...
└── zero_to_fp32.py
└── latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I just ran it wrong and the policy dir is expected
|
|
||
| # === 3. Load HF config and initialize model === | ||
| print("[3/5] Initializing Hugging Face model ...") | ||
| model = AutoModelForCausalLM.from_pretrained(HF_BASE, torch_dtype=torch.bfloat16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: torch_dtype is being deprecated and being replaced by dtype
|
Thank you so much @pbokc for your comments. I will rerun deepspeed, take a look and get back to you. |
|
@zhenga1 we can close since Deepspeed backend is being deprecated |
Wrote script for converting a Deepspeed model shard into huggingface .safetensors.
Filepath:
~/skyrl-train/scripts/convert_deepspeed_to_hf.py
Tested on Qwen0.5B and other models trained with run_gsm8k.sh with the deepspeed backend.