Fix the fsdp cmd in llm_sparsity #417

kaix-nv · 2025-10-09T05:18:50Z

What does this PR do?

Type of change: ?
Bug fix
Overview: ?
Fix the fsdp cmd in llm_sparsity. See slack.

Usage

# Add a code snippet demonstrating how to use this
 python hf_pts.py --model_name_or_path /home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B \
    --device cuda \
    --model_max_length 1024 \
    --dtype fp16 \
    --sparsity_fmt sparsegpt \
    --calib_size 128 \
    --output_dir saved_models_Llama-2-7b-hf_sparsegpt_tp1_pp1/pts

bash launch_finetune.sh --model /home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B \
    --max_length 1024 \
    --num_epochs 3 \
    --restore_path saved_models_Llama-2-7b-hf_sparsegpt_tp1_pp1/pts/pts_modelopt_state.pth \
    --output_dir saved_models_Llama-2-7b-hf_sparsegpt_tp1_pp1/finetuned

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Bug Fixes
- Fixed parsing of distributed-training options in an example launch script so multi-token flags are passed correctly across shells, reducing misconfiguration and runtime failures.
Chores
- Added a required runtime dependency to the example to ensure compatibility with recent model tooling and prevent install/runtime errors.

coderabbitai · 2025-10-09T05:19:15Z

Walkthrough

Removed single quotes around two FSDP-related CLI arguments in examples/llm_sparsity/launch_finetune.sh (making them separate tokens) and added transformers>=4.57.0 to examples/llm_sparsity/requirements.txt. No other logic or arguments were modified.

Changes

Cohort / File(s)	Summary
Launch script FSDP args `examples/llm_sparsity/launch_finetune.sh`	Unquoted FSDP flags: `--fsdp 'full_shard auto_wrap'` → `--fsdp full_shard auto_wrap`; `--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'` → `--fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer`.
Example requirements `examples/llm_sparsity/requirements.txt`	Added dependency: `transformers>=4.57.0`.

Sequence Diagram(s)

N/A

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

Thump-thump, I tweak with gentle hop,
Quotes un-nested—pop, pop, pop!
Flags now split, like carrots shared,
A new crate of transformers paired.
I twitch my nose and press to commit. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “Fix the fsdp cmd in llm_sparsity” succinctly and accurately summarizes the primary change, namely correcting the FSDP command in the llm_sparsity example. It is clear, concise, and directly reflects the content of the pull request without extraneous wording.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kaix/fsdp_fix

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb70a56 and 56232ad.

📒 Files selected for processing (2)

examples/llm_sparsity/launch_finetune.sh (1 hunks)
examples/llm_sparsity/requirements.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/llm_sparsity/launch_finetune.sh

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39f81e5 and e2c95a5.

📒 Files selected for processing (1)

examples/llm_sparsity/launch_finetune.sh (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: build-docs
GitHub Check: code-quality

coderabbitai · 2025-10-09T05:21:35Z

examples/llm_sparsity/launch_finetune.sh

+    --fsdp full_shard auto_wrap \
+    --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \


⚠️ Potential issue | 🔴 Critical

Keep the quotes on multi-word argument values.

Both --fsdp and --fsdp_transformer_layer_cls_to_wrap feed directly into finetune.py, whose argparse definitions (via transformers.TrainingArguments) expect a single string per flag. With the quotes removed, the shell now tokenizes auto_wrap as a separate argument, so the script sees --fsdp full_shard auto_wrap and errors with “unrecognized arguments: auto_wrap”. Same risk applies to the transformer layer flag. Please restore the quoting (or otherwise rejoin the values into a single token) to keep the command working.

🤖 Prompt for AI Agents

In examples/llm_sparsity/launch_finetune.sh around lines 94 to 95, the multi-word values for --fsdp and --fsdp_transformer_layer_cls_to_wrap were split into separate shell tokens; restore quoting (e.g., "--fsdp 'full_shard auto_wrap'" and "--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'") or otherwise join each multi-word value into a single token so argparse receives one string per flag.

examples/llm_sparsity/launch_finetune.sh

codecov · 2025-10-09T05:31:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.36%. Comparing base (5b02483) to head (56232ad).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #417      +/-   ##
==========================================
- Coverage   73.36%   73.36%   -0.01%     
==========================================
  Files         180      180              
  Lines       17919    17919              
==========================================
- Hits        13147    13146       -1     
- Misses       4772     4773       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

copy-pr-bot · 2025-10-09T19:53:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

kaix-nv · 2025-10-09T21:38:07Z

@RalphMao @realAsma Could I get a codeowner approval? Thanks.

kevalmorabia97 · 2025-10-10T17:01:30Z

@kaix-nv seems like your commits are not verified

Signed-off-by: Kai Xu <[email protected]>

kaix-nv requested a review from a team as a code owner October 9, 2025 05:18

kaix-nv requested review from RalphMao and kevalmorabia97 October 9, 2025 05:18

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

kevalmorabia97 reviewed Oct 9, 2025

View reviewed changes

examples/llm_sparsity/launch_finetune.sh Show resolved Hide resolved

kevalmorabia97 requested a review from realAsma October 9, 2025 07:35

kaix-nv force-pushed the kaix/fsdp_fix branch from e2c95a5 to 49340c8 Compare October 9, 2025 18:10

kevalmorabia97 approved these changes Oct 9, 2025

View reviewed changes

kaix-nv enabled auto-merge (squash) October 9, 2025 18:48

kaix-nv force-pushed the kaix/fsdp_fix branch from 63d1b53 to b751e16 Compare October 9, 2025 19:53

kevalmorabia97 disabled auto-merge October 10, 2025 09:19

kaix-nv enabled auto-merge (squash) October 10, 2025 17:35

kaix-nv force-pushed the kaix/fsdp_fix branch 4 times, most recently from 9ab1652 to eb70a56 Compare October 10, 2025 18:29

fix the fsdp cmd in llm_sparsity

56232ad

Signed-off-by: Kai Xu <[email protected]>

kaix-nv force-pushed the kaix/fsdp_fix branch from eb70a56 to 56232ad Compare October 10, 2025 18:33

kevalmorabia97 disabled auto-merge October 10, 2025 18:53

kevalmorabia97 enabled auto-merge (squash) October 10, 2025 18:53

kevalmorabia97 merged commit 08fb23f into main Oct 10, 2025
27 checks passed

kevalmorabia97 deleted the kaix/fsdp_fix branch October 10, 2025 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the fsdp cmd in llm_sparsity #417

Fix the fsdp cmd in llm_sparsity #417

Uh oh!

kaix-nv commented Oct 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 9, 2025

Uh oh!

Uh oh!

codecov bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 9, 2025

Uh oh!

kaix-nv commented Oct 9, 2025

Uh oh!

kevalmorabia97 commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		--fsdp full_shard auto_wrap \
		--fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \

Fix the fsdp cmd in llm_sparsity #417

Fix the fsdp cmd in llm_sparsity #417

Uh oh!

Conversation

kaix-nv commented Oct 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

copy-pr-bot bot commented Oct 9, 2025

Uh oh!

kaix-nv commented Oct 9, 2025

Uh oh!

kevalmorabia97 commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaix-nv commented Oct 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

codecov bot commented Oct 9, 2025 •

edited

Loading