Skip to content

Conversation

AAnoosheh
Copy link
Contributor

@AAnoosheh AAnoosheh commented Oct 2, 2025

What does this PR do?

Type of change: ? Minor tweak

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Y
  • Did you write any new necessary tests?: N
  • Did you add or update any necessary documentation?: N
  • Did you update Changelog?: N

Additional Information

Summary by CodeRabbit

  • New Features

    • More flexible distillation configuration: you can now provide a config object directly, a file path, or omit it to use sensible defaults.
  • Refactor

    • Unified the distillation configuration setup into a single, streamlined API for simpler workflows.
    • The previous configuration loader was replaced; projects may need to update calls to use the new API.

@AAnoosheh AAnoosheh self-assigned this Oct 2, 2025
@AAnoosheh AAnoosheh requested a review from a team as a code owner October 2, 2025 18:03
@AAnoosheh AAnoosheh requested a review from realAsma October 2, 2025 18:03
Copy link

coderabbitai bot commented Oct 2, 2025

Walkthrough

Renames and reworks the distillation config loader to accept a path, a DistillationConfig instance, or None. Implements branching: None yields a default config, a provided DistillationConfig is used directly, or a YAML path is loaded and parsed. Updates docstrings and replaces the old function.

Changes

Cohort / File(s) Summary
Distillation config API update
modelopt/torch/distill/plugins/megatron.py
Replaced load_distillation_config(...) with setup_distillation_config(config_or_path, student_cfg, teacher_cfg). New input contract supports None (default), direct DistillationConfig, or YAML path. Updated branching and docstrings; return type remains DistillationConfig.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant M as setup_distillation_config
  participant Y as YAML Loader

  C->>M: setup_distillation_config(config_or_path, student_cfg, teacher_cfg)

  alt config_or_path is None
    M-->>C: Default DistillationConfig
  else config_or_path is DistillationConfig
    M-->>C: Provided DistillationConfig (as-is)
  else config_or_path is str (path)
    M->>Y: Load & parse YAML
    Y-->>M: Parsed settings
    M-->>C: DistillationConfig from YAML
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hop through configs, nimble and bright,
Paths, objects, or nothing—still set it right.
Three doors to distill, one flask to fill,
YAML or default, I bounce at will.
Tap-tap my paws—new name, same thrill. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately summarizes the key change of allowing a DistillationConfig instance to be passed directly to the setup function, matching the main modification in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aanoosheh/kd-direct-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AAnoosheh AAnoosheh enabled auto-merge (squash) October 2, 2025 18:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
modelopt/torch/distill/plugins/megatron.py (2)

100-109: Clarify the docstring for the new input modes.

The docstring update mentions "the incomplete config itself" which is unclear. Consider revising to explicitly document all three input modes:

Apply this diff to improve the docstring:

-    """Read the distillation yaml config file specified by ``args.export_kd_cfg``.
+    """Setup distillation configuration from various input sources.
 
     Args:
-        config_or_path: Path to user-defined distillation settings yaml file, or the incomplete config itself.
-            If `None`, uses default logits-only distillation mode for GPT models.
+        config_or_path: One of:
+            - `None`: Uses default logits-only distillation mode for GPT models.
+            - `DistillationConfig`: Uses the provided config instance directly.
+            - `str`: Path to a YAML file containing distillation settings.
         student_cfg: Model config for student model.
         teacher_cfg: Model config for teacher model.

115-118: Add error handling for file operations and YAML parsing.

The code opens a file and parses YAML without error handling. If the path doesn't exist, the file is malformed, or the YAML contains invalid fields for DistillationConfig, the function will raise an unhandled exception.

Apply this diff to add basic error handling:

     else:
-        with open(config_or_path) as f:
-            cfg = yaml.safe_load(f)
-        cfg = DistillationConfig(**cfg)
+        try:
+            with open(config_or_path) as f:
+                cfg_dict = yaml.safe_load(f)
+            cfg = DistillationConfig(**cfg_dict)
+        except FileNotFoundError:
+            raise FileNotFoundError(f"Distillation config file not found: {config_or_path}")
+        except yaml.YAMLError as e:
+            raise ValueError(f"Invalid YAML in distillation config: {e}")
+        except TypeError as e:
+            raise ValueError(f"Invalid distillation config fields: {e}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb44c55 and 5ca1e25.

📒 Files selected for processing (1)
  • modelopt/torch/distill/plugins/megatron.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
modelopt/torch/distill/plugins/megatron.py (1)

95-99: Rename verified: no remaining references to load_distillation_config. All internal callers are updated.

Comment on lines +110 to +118
if config_or_path is None:
logger.warning("Distillation config not provided. Using default.")
cfg = DistillationConfig()
elif isinstance(config_or_path, DistillationConfig):
cfg = config_or_path
else:
with open(config_or_path) as f:
cfg = yaml.safe_load(f)
cfg = DistillationConfig(**cfg)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider the implications of mutating the input DistillationConfig.

When a DistillationConfig instance is passed in (line 114), the function directly assigns it to cfg and later mutates it by setting cfg.criterion (line 146) and cfg.loss_balancer (line 147). This means the caller's original config object will be modified, which may be unexpected if they intend to reuse the config.

Consider one of these approaches:

  1. Document the mutation in the docstring:
         config_or_path: One of:
             - `None`: Uses default logits-only distillation mode for GPT models.
-            - `DistillationConfig`: Uses the provided config instance directly.
+            - `DistillationConfig`: Uses the provided config instance directly (will be modified in-place).
             - `str`: Path to a YAML file containing distillation settings.
  1. Create a copy to avoid mutating the input:
     elif isinstance(config_or_path, DistillationConfig):
-        cfg = config_or_path
+        from copy import deepcopy
+        cfg = deepcopy(config_or_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if config_or_path is None:
logger.warning("Distillation config not provided. Using default.")
cfg = DistillationConfig()
elif isinstance(config_or_path, DistillationConfig):
cfg = config_or_path
else:
with open(config_or_path) as f:
cfg = yaml.safe_load(f)
cfg = DistillationConfig(**cfg)
if config_or_path is None:
logger.warning("Distillation config not provided. Using default.")
cfg = DistillationConfig()
elif isinstance(config_or_path, DistillationConfig):
from copy import deepcopy
cfg = deepcopy(config_or_path)
else:
with open(config_or_path) as f:
cfg = yaml.safe_load(f)
cfg = DistillationConfig(**cfg)
🤖 Prompt for AI Agents
In modelopt/torch/distill/plugins/megatron.py around lines 110 to 118, the
function assigns a passed-in DistillationConfig instance directly to cfg and
later mutates it, which unintentionally alters the caller's object; to fix this,
create a shallow or deep copy of the incoming DistillationConfig (e.g., via
copy.deepcopy or a provided copy/clone/from_dict constructor) and assign that
copy to cfg before any mutations so the original remains unchanged, or
alternatively document in the function docstring that the input config will be
mutated if that behavior is intended.

Copy link

codecov bot commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (cb44c55) to head (5ca1e25).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #399   +/-   ##
=======================================
  Coverage   73.79%   73.79%           
=======================================
  Files         171      171           
  Lines       17591    17591           
=======================================
  Hits        12982    12982           
  Misses       4609     4609           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AAnoosheh AAnoosheh merged commit adcb1a1 into main Oct 2, 2025
27 checks passed
@AAnoosheh AAnoosheh deleted the aanoosheh/kd-direct-config branch October 2, 2025 19:42
kevalmorabia97 pushed a commit that referenced this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants