Skip to content

Conversation

@ananthsub
Copy link
Contributor

@ananthsub ananthsub commented Jan 12, 2026

What does this PR do ?

Part of NVIDIA-NeMo/Gym#292

This PR documents the NeMo RL + Gym integration, which includes:

  1. The Ray actor bridge code in RL that initializes & launches Gym, and how Gym re-uses the Ray cluster info
  2. How RL prepares its vLLM servers for Gym to proxy through to, so inference logic is contained within RL
  3. The training loop flow for how RL sends request data to Gym and how the data is translated between Gym and RL formats

Issues

NVIDIA-NeMo/Gym#292

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Documentation
    • Added comprehensive design documentation for NeMo Gym integration covering the complete system architecture, actor components, HTTP proxy configuration, initialization procedures, full training loop control flow, data translation specifications, tokenization requirements, API endpoint definitions, and integration workflows. Includes detailed visual architectural diagrams throughout.

Signed-off-by: Ananth Subramaniam <[email protected]>
@ananthsub ananthsub force-pushed the docs-rl-gym-diagram branch from 9527953 to 9ea9f1a Compare February 2, 2026 17:39
@ananthsub ananthsub changed the title [docs] Document Gym + RL integration design docs: Document Gym + RL integration design Feb 2, 2026
@ananthsub ananthsub marked this pull request as ready for review February 2, 2026 21:10
@ananthsub ananthsub requested a review from a team as a code owner February 2, 2026 21:10
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

Two documentation files were added: a new design document describing the NeMo Gym integration architecture, initialization sequence, training loop, data formats, and tokenization with Mermaid diagrams; and an update to the documentation index to include the new design document in the navigation structure.

Changes

Cohort / File(s) Summary
Design Documentation
docs/design-docs/nemo-gym-integration.md, docs/index.md
New design documentation file for NeMo Gym integration describing architecture (NemoGym Actor, vLLM HTTP proxy, rollouts flow), initialization sequence, training loop control flow, data format translation, and tokenization with visual diagrams. Documentation index updated to include the new design document.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding documentation for the Gym and RL integration design.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only documentation changes with no code modifications, new features, or impact on numerics/performance, qualifying as a minor change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/design-docs/nemo-gym-integration.md`:
- Around line 22-23: Update the inline comments for the two config keys to
explicitly state their relationship: note that async_engine and
expose_http_server are independent settings but both must be enabled to support
the HTTP server; e.g., change the comment on async_engine to clarify it enables
the async worker/runtime and the comment on expose_http_server to state it
controls whether the HTTP server (exposing /v1/chat/completions) is started, and
add a combined comment line that both must be true to enable HTTP server
support.
🧹 Nitpick comments (1)
docs/design-docs/nemo-gym-integration.md (1)

184-184: Minor grammar refinement.

For consistency with the formal tone used throughout the documentation, consider revising "Results return out of order" to "Results are returned out of order".

📝 Suggested revision
-1. **Results return out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.
+1. **Results are returned out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.

Comment on lines +22 to +23
async_engine: true # Required for HTTP server support
expose_http_server: true # Exposes /v1/chat/completions endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n -A10 -B2 'expose_http_server|async_engine' --type=py -g '!**/test/**' | head -100

Repository: NVIDIA-NeMo/RL

Length of output: 8349


🏁 Script executed:

rg -n 'expose_http_server' --type=py --type=md --type=yaml

Repository: NVIDIA-NeMo/RL

Length of output: 1511


🏁 Script executed:

sed -n '165,185p' nemo_rl/models/generation/vllm/vllm_worker_async.py

Repository: NVIDIA-NeMo/RL

Length of output: 962


🏁 Script executed:

sed -n '1,170p' nemo_rl/models/generation/vllm/vllm_worker_async.py | grep -A20 -B5 'async_engine\|AsyncLLM'

Repository: NVIDIA-NeMo/RL

Length of output: 1110


🏁 Script executed:

sed -n '1,50p' nemo_rl/models/generation/vllm/config.py

Repository: NVIDIA-NeMo/RL

Length of output: 2404


🏁 Script executed:

rg -n -B5 -A5 'expose_http_server.*async|async.*expose_http_server' --type=py

Repository: NVIDIA-NeMo/RL

Length of output: 1930


Clarify the relationship between async_engine and expose_http_server.

Both settings are required to enable HTTP server support, but they are independent: async_engine is inherent to the async worker, while expose_http_server controls whether the HTTP server is actually started. Consider updating the comments to make this clearer, e.g., "Both required for HTTP server support" rather than leaving the relationship ambiguous.

🤖 Prompt for AI Agents
In `@docs/design-docs/nemo-gym-integration.md` around lines 22 - 23, Update the
inline comments for the two config keys to explicitly state their relationship:
note that async_engine and expose_http_server are independent settings but both
must be enabled to support the HTTP server; e.g., change the comment on
async_engine to clarify it enables the async worker/runtime and the comment on
expose_http_server to state it controls whether the HTTP server (exposing
/v1/chat/completions) is started, and add a combined comment line that both must
be true to enable HTTP server support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant