You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add support from building images using vllm from private repos (#1605)
Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/guides/use-custom-vllm.md
+76-1Lines changed: 76 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,4 +78,79 @@ docker buildx build \
78
78
--tag <registry>/nemo-rl:latest \
79
79
--push \
80
80
.
81
-
```
81
+
```
82
+
83
+
### SSH Setup for Private Repositories
84
+
85
+
If your custom vLLM is hosted in a **private repository** (e.g., internal GitLab), you need to set up SSH agent forwarding for Docker to clone it during the build.
86
+
87
+
#### Prerequisites
88
+
1. Your SSH key must be registered on the Git server (GitLab/GitHub)
89
+
2. The key must **not be expired** - check your Git server's SSH key settings
90
+
3. The key must be loaded into your local ssh-agent
# If you see "Your SSH key has expired", renew it on the server
100
+
```
101
+
102
+
#### Step 2: Load your SSH key into the agent
103
+
104
+
```sh
105
+
# Check if an ssh-agent is already running
106
+
echo$SSH_AUTH_SOCK
107
+
108
+
# If empty, start one (this also sets SSH_AUTH_SOCK which `docker buildx` expects to be set when using `--ssh default`)
109
+
eval"$(ssh-agent -s)"
110
+
111
+
# Clear any old/expired keys from the agent
112
+
ssh-add -D
113
+
114
+
# Add your SSH key (use the key registered on your Git server)
115
+
ssh-add ~/.ssh/id_ed25519
116
+
117
+
# Verify it's loaded
118
+
ssh-add -l
119
+
```
120
+
121
+
#### Step 3: Run the Docker build with SSH forwarding
122
+
123
+
```sh
124
+
docker buildx build \
125
+
--build-arg BUILD_CUSTOM_VLLM=1 \
126
+
--target release \
127
+
--build-context nemo-rl=. \
128
+
-f docker/Dockerfile \
129
+
--ssh default \
130
+
--tag <registry>/nemo-rl:latest \
131
+
--push \
132
+
.
133
+
```
134
+
135
+
## Running Applications with a Custom vLLM Container
136
+
137
+
When using a container built with custom vLLM, **use the frozen environment workflow** (bare `python`) instead of `uv run` with `NRL_FORCE_REBUILD_VENVS=true`.
138
+
139
+
```sh
140
+
# Recommended: use bare python (frozen environment)
141
+
python examples/run_grpo_math.py
142
+
143
+
# NOT recommended with custom vLLM containers:
144
+
# uv run examples/run_grpo_math.py
145
+
# or
146
+
# NRL_FORCE_REBUILD_VENVS=true uv run examples/run_grpo_math.py
147
+
```
148
+
149
+
### Why Not Use `uv run` or Rebuild Venvs?
150
+
151
+
Rebuilding worker virtual environments (via `uv run` or `NRL_FORCE_REBUILD_VENVS=true`) requires having the custom vLLM compiled locally. However, compiling vLLM requires a container environment with the correct CUDA toolchain—creating a chicken-and-egg problem.
152
+
153
+
The container already has vLLM built and cached in the frozen environments. Using bare `python` leverages these pre-built environments directly, avoiding the need to recompile vLLM at runtime.
154
+
155
+
> [!TIP]
156
+
> For more details on frozen environments and how they differ from `uv run`, see the [Dependency Management](../design-docs/dependency-management.md#frozen-environments) documentation.
0 commit comments