Skip to content

Conversation

@kaix-nv
Copy link
Contributor

@kaix-nv kaix-nv commented Oct 9, 2025

What does this PR do?

Type of change: ?
Bug fix
Overview: ?
Fix the fsdp cmd in llm_sparsity. See slack.

Usage

# Add a code snippet demonstrating how to use this
 python hf_pts.py --model_name_or_path /home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B \
    --device cuda \
    --model_max_length 1024 \
    --dtype fp16 \
    --sparsity_fmt sparsegpt \
    --calib_size 128 \
    --output_dir saved_models_Llama-2-7b-hf_sparsegpt_tp1_pp1/pts

bash launch_finetune.sh --model /home/scratch.omniml_data_1/models/llama3.1/Meta-Llama-3.1-8B \
    --max_length 1024 \
    --num_epochs 3 \
    --restore_path saved_models_Llama-2-7b-hf_sparsegpt_tp1_pp1/pts/pts_modelopt_state.pth \
    --output_dir saved_models_Llama-2-7b-hf_sparsegpt_tp1_pp1/finetuned

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Fixed parsing of distributed-training options in an example launch script so multi-token flags are passed correctly across shells, reducing misconfiguration and runtime failures.
  • Chores
    • Added a required runtime dependency to the example to ensure compatibility with recent model tooling and prevent install/runtime errors.

@kaix-nv kaix-nv requested a review from a team as a code owner October 9, 2025 05:18
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

Walkthrough

Removed single quotes around two FSDP-related CLI arguments in examples/llm_sparsity/launch_finetune.sh (making them separate tokens) and added transformers>=4.57.0 to examples/llm_sparsity/requirements.txt. No other logic or arguments were modified.

Changes

Cohort / File(s) Summary
Launch script FSDP args
examples/llm_sparsity/launch_finetune.sh
Unquoted FSDP flags: --fsdp 'full_shard auto_wrap'--fsdp full_shard auto_wrap; --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'--fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer.
Example requirements
examples/llm_sparsity/requirements.txt
Added dependency: transformers>=4.57.0.

Sequence Diagram(s)

N/A

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

Thump-thump, I tweak with gentle hop,
Quotes un-nested—pop, pop, pop!
Flags now split, like carrots shared,
A new crate of transformers paired.
I twitch my nose and press to commit. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Fix the fsdp cmd in llm_sparsity” succinctly and accurately summarizes the primary change, namely correcting the FSDP command in the llm_sparsity example. It is clear, concise, and directly reflects the content of the pull request without extraneous wording.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kaix/fsdp_fix

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb70a56 and 56232ad.

📒 Files selected for processing (2)
  • examples/llm_sparsity/launch_finetune.sh (1 hunks)
  • examples/llm_sparsity/requirements.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/llm_sparsity/launch_finetune.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: code-quality
  • GitHub Check: build-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39f81e5 and e2c95a5.

📒 Files selected for processing (1)
  • examples/llm_sparsity/launch_finetune.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: linux
  • GitHub Check: build-docs
  • GitHub Check: code-quality

Comment on lines +94 to +95
--fsdp full_shard auto_wrap \
--fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep the quotes on multi-word argument values.

Both --fsdp and --fsdp_transformer_layer_cls_to_wrap feed directly into finetune.py, whose argparse definitions (via transformers.TrainingArguments) expect a single string per flag. With the quotes removed, the shell now tokenizes auto_wrap as a separate argument, so the script sees --fsdp full_shard auto_wrap and errors with “unrecognized arguments: auto_wrap”. Same risk applies to the transformer layer flag. Please restore the quoting (or otherwise rejoin the values into a single token) to keep the command working.

🤖 Prompt for AI Agents
In examples/llm_sparsity/launch_finetune.sh around lines 94 to 95, the
multi-word values for --fsdp and --fsdp_transformer_layer_cls_to_wrap were split
into separate shell tokens; restore quoting (e.g., "--fsdp 'full_shard
auto_wrap'" and "--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'") or
otherwise join each multi-word value into a single token so argparse receives
one string per flag.

@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.36%. Comparing base (5b02483) to head (56232ad).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #417      +/-   ##
==========================================
- Coverage   73.36%   73.36%   -0.01%     
==========================================
  Files         180      180              
  Lines       17919    17919              
==========================================
- Hits        13147    13146       -1     
- Misses       4772     4773       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kaix-nv kaix-nv enabled auto-merge (squash) October 9, 2025 18:48
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kaix-nv
Copy link
Contributor Author

kaix-nv commented Oct 9, 2025

@RalphMao @realAsma Could I get a codeowner approval? Thanks.

@kevalmorabia97
Copy link
Collaborator

@kaix-nv seems like your commits are not verified

@kaix-nv kaix-nv enabled auto-merge (squash) October 10, 2025 17:35
@kaix-nv kaix-nv force-pushed the kaix/fsdp_fix branch 4 times, most recently from 9ab1652 to eb70a56 Compare October 10, 2025 18:29
@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) October 10, 2025 18:53
@kevalmorabia97 kevalmorabia97 merged commit 08fb23f into main Oct 10, 2025
27 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kaix/fsdp_fix branch October 10, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants