docs: Document Gym + RL integration design #1762

ananthsub · 2026-01-12T12:57:02Z

What does this PR do ?

Part of NVIDIA-NeMo/Gym#292

This PR documents the NeMo RL + Gym integration, which includes:

The Ray actor bridge code in RL that initializes & launches Gym, and how Gym re-uses the Ray cluster info
How RL prepares its vLLM servers for Gym to proxy through to, so inference logic is contained within RL
The training loop flow for how RL sends request data to Gym and how the data is translated between Gym and RL formats

Issues

NVIDIA-NeMo/Gym#292

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Documentation
- Added comprehensive design documentation for NeMo Gym integration covering the complete system architecture, actor components, HTTP proxy configuration, initialization procedures, full training loop control flow, data translation specifications, tokenization requirements, API endpoint definitions, and integration workflows. Includes detailed visual architectural diagrams throughout.

Signed-off-by: Ananth Subramaniam <[email protected]>

coderabbitai · 2026-02-02T21:17:25Z

📝 Walkthrough

Walkthrough

Two documentation files were added: a new design document describing the NeMo Gym integration architecture, initialization sequence, training loop, data formats, and tokenization with Mermaid diagrams; and an update to the documentation index to include the new design document in the navigation structure.

Changes

Cohort / File(s)	Summary
Design Documentation `docs/design-docs/nemo-gym-integration.md`, `docs/index.md`	New design documentation file for NeMo Gym integration describing architecture (NemoGym Actor, vLLM HTTP proxy, rollouts flow), initialization sequence, training loop control flow, data format translation, and tokenization with visual diagrams. Documentation index updated to include the new design document.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: adding documentation for the Gym and RL integration design.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR contains only documentation changes with no code modifications, new features, or impact on numerics/performance, qualifying as a minor change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/design-docs/nemo-gym-integration.md`:
- Around line 22-23: Update the inline comments for the two config keys to
explicitly state their relationship: note that async_engine and
expose_http_server are independent settings but both must be enabled to support
the HTTP server; e.g., change the comment on async_engine to clarify it enables
the async worker/runtime and the comment on expose_http_server to state it
controls whether the HTTP server (exposing /v1/chat/completions) is started, and
add a combined comment line that both must be true to enable HTTP server
support.

🧹 Nitpick comments (1)

docs/design-docs/nemo-gym-integration.md (1)
184-184: Minor grammar refinement.

For consistency with the formal tone used throughout the documentation, consider revising "Results return out of order" to "Results are returned out of order".
📝 Suggested revision
-1. **Results return out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.
+1. **Results are returned out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.

coderabbitai · 2026-02-02T21:17:28Z

docs/design-docs/nemo-gym-integration.md

+      async_engine: true          # Required for HTTP server support
+      expose_http_server: true    # Exposes /v1/chat/completions endpoint


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n -A10 -B2 'expose_http_server|async_engine' --type=py -g '!**/test/**' | head -100

Repository: NVIDIA-NeMo/RL

Length of output: 8349

🏁 Script executed:

rg -n 'expose_http_server' --type=py --type=md --type=yaml

Repository: NVIDIA-NeMo/RL

Length of output: 1511

🏁 Script executed:

sed -n '165,185p' nemo_rl/models/generation/vllm/vllm_worker_async.py

Repository: NVIDIA-NeMo/RL

Length of output: 962

🏁 Script executed:

sed -n '1,170p' nemo_rl/models/generation/vllm/vllm_worker_async.py | grep -A20 -B5 'async_engine\|AsyncLLM'

Repository: NVIDIA-NeMo/RL

Length of output: 1110

🏁 Script executed:

sed -n '1,50p' nemo_rl/models/generation/vllm/config.py

Repository: NVIDIA-NeMo/RL

Length of output: 2404

🏁 Script executed:

rg -n -B5 -A5 'expose_http_server.*async|async.*expose_http_server' --type=py

Repository: NVIDIA-NeMo/RL

Length of output: 1930

Clarify the relationship between async_engine and expose_http_server.

Both settings are required to enable HTTP server support, but they are independent: async_engine is inherent to the async worker, while expose_http_server controls whether the HTTP server is actually started. Consider updating the comments to make this clearer, e.g., "Both required for HTTP server support" rather than leaving the relationship ambiguous.

🤖 Prompt for AI Agents

In `@docs/design-docs/nemo-gym-integration.md` around lines 22 - 23, Update the inline comments for the two config keys to explicitly state their relationship: note that async_engine and expose_http_server are independent settings but both must be enabled to support the HTTP server; e.g., change the comment on async_engine to clarify it enables the async worker/runtime and the comment on expose_http_server to state it controls whether the HTTP server (exposing /v1/chat/completions) is started, and add a combined comment line that both must be true to enable HTTP server support.

ananthsub requested a review from bxyu-nvidia January 12, 2026 12:57

ananthsub added the documentation Improvements or additions to documentation label Jan 12, 2026

ananthsub temporarily deployed to nemo-ci January 12, 2026 12:57 — with GitHub Actions Inactive

ananthsub temporarily deployed to nemo-ci January 12, 2026 13:00 — with GitHub Actions Inactive

[docs] Add gym + rl design integration

9ea9f1a

Signed-off-by: Ananth Subramaniam <[email protected]>

ananthsub force-pushed the docs-rl-gym-diagram branch from 9527953 to 9ea9f1a Compare February 2, 2026 17:39

ananthsub temporarily deployed to nemo-ci February 2, 2026 17:39 — with GitHub Actions Inactive

ananthsub changed the title ~~[docs] Document Gym + RL integration design~~ docs: Document Gym + RL integration design Feb 2, 2026

ananthsub temporarily deployed to nemo-ci February 2, 2026 18:06 — with GitHub Actions Inactive

ananthsub marked this pull request as ready for review February 2, 2026 21:10

ananthsub requested a review from a team as a code owner February 2, 2026 21:10

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Document Gym + RL integration design #1762

docs: Document Gym + RL integration design #1762

ananthsub commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		async_engine: true # Required for HTTP server support
		expose_http_server: true # Exposes /v1/chat/completions endpoint

docs: Document Gym + RL integration design #1762

Are you sure you want to change the base?

docs: Document Gym + RL integration design #1762

Conversation

ananthsub commented Jan 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ananthsub commented Jan 12, 2026 •

edited by coderabbitai bot

Loading