Skip to content

feat: custom logits processor#2093

Closed
cmunley1 wants to merge 1 commit intomainfrom
cmunley/gym-logits-processor
Closed

feat: custom logits processor#2093
cmunley1 wants to merge 1 commit intomainfrom
cmunley/gym-logits-processor

Conversation

@cmunley1
Copy link
Contributor

@cmunley1 cmunley1 commented Mar 10, 2026

What does this PR do ?

allow custom logits process in vllm

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features
    • Added per-request thinking budget configuration with customizable grace periods and end token specifications
    • Enabled runtime configuration and dynamic loading of custom logits processors
    • Introduced environment variable support for logits processor setup

Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cmunley1 cmunley1 requested review from a team as code owners March 10, 2026 06:11
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cmunley1 cmunley1 closed this Mar 10, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fc38c04d-a310-4f01-b190-c1fb19d41463

📥 Commits

Reviewing files that changed from the base of the PR and between 280d3aa and 329c5dc.

📒 Files selected for processing (4)
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/generation/vllm/config.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • nemo_rl/models/generation/vllm/vllm_worker.py

📝 Walkthrough

Walkthrough

This change implements support for custom logits processors in the vLLM generation pipeline. It extends configuration schema to specify logits processor classes and environment variables, adds dynamic class loading during worker initialization, and integrates per-sample thinking budget arguments into generation requests through metadata injection.

Changes

Cohort / File(s) Summary
Configuration Schema
nemo_rl/models/generation/vllm/config.py
Added two new fields to VllmSpecificArgs: logits_processors (list of processor class strings) and logits_processor_env_vars (environment variable mapping).
Worker Initialization & Integration
nemo_rl/models/generation/vllm/vllm_worker.py, nemo_rl/models/generation/vllm/vllm_generation.py
Implemented _load_logits_processor_classes() static method for dynamic class loading from configuration strings (format: module_path:ClassName). Integrated logits processor loading and environment variable propagation across worker configuration, speculative decoding patching, and main initialization paths.
Per-Sample Request Metadata
nemo_rl/experience/rollouts.py
Added up-front parsing of global logits processor defaults from policy config. Injects per-sample vllm_xargs into generation request metadata, combining per-sample thinking budget values with global defaults (thinking_budget_grace_period, end_token_ids), serialized into extra_body field.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Policy Config
    participant Worker as VLLMWorker
    participant Rollout as Rollout Generator
    participant Engine as vLLM Engine

    Config->>Worker: Initialize with logits_processors config
    Worker->>Worker: Load custom logits processor classes<br/>(module_path:ClassName)
    Worker->>Worker: Set environment variables from<br/>logits_processor_env_vars
    Worker->>Engine: Initialize with loaded logits_processors
    
    Rollout->>Rollout: Parse global logits processor defaults<br/>from policy config
    Rollout->>Rollout: For each row with thinking_budget:<br/>build per-sample vllm_xargs
    Rollout->>Engine: Submit request with per-sample args<br/>in extra_body metadata
    Engine->>Engine: Apply logits processors with<br/>per-sample budget overrides
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cmunley/gym-logits-processor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant