Skip to content

fix num_labels= 1 test fail #3493

Open
ved1beta wants to merge 5 commits intoaxolotl-ai-cloud:mainfrom
ved1beta:trl_num_lables
Open

fix num_labels= 1 test fail #3493
ved1beta wants to merge 5 commits intoaxolotl-ai-cloud:mainfrom
ved1beta:trl_num_lables

Conversation

@ved1beta
Copy link
Contributor

@ved1beta ved1beta commented Mar 13, 2026

Description

Add a set_reward_model_defaults model validator in validation.py that automatically sets num_labels=1 for reward_model: True and num_labels=2 for process_reward_model: True, and defaults the model_type to the appropriate AutoModel class.

Re-enable rm_cfg in test_builder_w_rm_trainers which was previously disabled pending this fix.

How has this been tested?

test_reward_model_defaults
test_process_reward_model_defaults

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic default configuration for reward model settings, intelligently setting model type and label counts based on configuration context.
  • Tests

    • Enhanced validation test coverage for reward model defaults to ensure configuration reliability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9938837a-40bf-47f4-a7a7-3060b93b64e9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A new pre-validation hook was added to automatically set default values for reward model configurations. When reward_model or process_reward_model flags are present, missing num_labels and model_type fields are populated with appropriate defaults. Test coverage was extended to validate this behavior.

Changes

Cohort / File(s) Summary
Validation Hook
src/axolotl/utils/schemas/validation.py
Added set_reward_model_defaults() method to TrainingValidationMixin that sets num_labels defaults (1 for reward_model, 2 for process_reward_model) and model_type to appropriate AutoModel classes when not provided.
Test Coverage
tests/core/test_builders.py, tests/patched/test_validation.py
Uncommented reward model fixture in builder test. Added two new test methods validating default values for reward_model and process_reward_model configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix num_labels= 1 test fail' is vague and lacks specific context about what is being fixed. It mentions 'num_labels' and 'test fail' but doesn't clearly explain the actual implementation—adding reward model default validators. Consider a more descriptive title such as 'Add reward model default validators for num_labels and model_type' or 'Auto-set num_labels and model_type for reward models' to better convey the core change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/axolotl/utils/schemas/validation.py (1)

260-270: Consider adding mutual exclusivity validation for reward_model and process_reward_model.

If both flags are set simultaneously, process_reward_model values will silently override reward_model defaults. While this edge case is unlikely in practice, you may want to add explicit validation to catch misconfiguration early.

♻️ Optional enhancement to validate mutual exclusivity
 `@model_validator`(mode="before")
 `@classmethod`
 def set_reward_model_defaults(cls, data):
+    if data.get("reward_model") and data.get("process_reward_model"):
+        raise ValueError(
+            "reward_model and process_reward_model are mutually exclusive"
+        )
+
     if data.get("reward_model"):
         if data.get("num_labels") is None:
             data["num_labels"] = 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/axolotl/utils/schemas/validation.py` around lines 260 - 270, Add a
mutual-exclusivity check before the existing defaulting logic so both
"reward_model" and "process_reward_model" cannot be set simultaneously: inspect
the incoming data dict at the top of the block that sets defaults for
reward_model/process_reward_model and if both data.get("reward_model") and
data.get("process_reward_model") are truthy raise a validation error (e.g.,
ValueError or the module's ValidationError) with a clear message; keep the rest
of the defaulting code for "model_type" and "num_labels" unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/axolotl/utils/schemas/validation.py`:
- Around line 260-270: Add a mutual-exclusivity check before the existing
defaulting logic so both "reward_model" and "process_reward_model" cannot be set
simultaneously: inspect the incoming data dict at the top of the block that sets
defaults for reward_model/process_reward_model and if both
data.get("reward_model") and data.get("process_reward_model") are truthy raise a
validation error (e.g., ValueError or the module's ValidationError) with a clear
message; keep the rest of the defaulting code for "model_type" and "num_labels"
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aae302b6-5fcf-4ae3-97b9-144cb6dfddba

📥 Commits

Reviewing files that changed from the base of the PR and between 083c5a0 and 70cf3f4.

📒 Files selected for processing (3)
  • src/axolotl/utils/schemas/validation.py
  • tests/core/test_builders.py
  • tests/patched/test_validation.py

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants