Skip to content

Commit df01ca7

Browse files
feat: add support from building images using vllm from private repos (#1605)
Signed-off-by: Terry Kong <[email protected]> Signed-off-by: Terry Kong <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 52cebdf commit df01ca7

File tree

3 files changed

+80
-3
lines changed

3 files changed

+80
-3
lines changed

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ COPY --from=nemo-rl tools/build-custom-vllm.sh ./tools/build-custom-vllm.sh
8585
COPY --from=nemo-rl --link research/ ./research/
8686
COPY --from=nemo-rl --link 3rdparty/ ./3rdparty/
8787

88-
RUN <<"EOF" bash -exu
88+
RUN --mount=type=ssh <<"EOF" bash -exu
8989
uv venv --seed
9090
if [[ -n "${BUILD_CUSTOM_VLLM:-}" ]]; then
9191
bash tools/build-custom-vllm.sh

docs/guides/use-custom-vllm.md

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,4 +78,79 @@ docker buildx build \
7878
--tag <registry>/nemo-rl:latest \
7979
--push \
8080
.
81-
```
81+
```
82+
83+
### SSH Setup for Private Repositories
84+
85+
If your custom vLLM is hosted in a **private repository** (e.g., internal GitLab), you need to set up SSH agent forwarding for Docker to clone it during the build.
86+
87+
#### Prerequisites
88+
1. Your SSH key must be registered on the Git server (GitLab/GitHub)
89+
2. The key must **not be expired** - check your Git server's SSH key settings
90+
3. The key must be loaded into your local ssh-agent
91+
92+
#### Step 1: Verify your SSH key works
93+
94+
```sh
95+
# For GitLab (adjust host/port as needed)
96+
ssh -T [email protected] -p 12051
97+
98+
# You should see: "Welcome to GitLab, @username!"
99+
# If you see "Your SSH key has expired", renew it on the server
100+
```
101+
102+
#### Step 2: Load your SSH key into the agent
103+
104+
```sh
105+
# Check if an ssh-agent is already running
106+
echo $SSH_AUTH_SOCK
107+
108+
# If empty, start one (this also sets SSH_AUTH_SOCK which `docker buildx` expects to be set when using `--ssh default`)
109+
eval "$(ssh-agent -s)"
110+
111+
# Clear any old/expired keys from the agent
112+
ssh-add -D
113+
114+
# Add your SSH key (use the key registered on your Git server)
115+
ssh-add ~/.ssh/id_ed25519
116+
117+
# Verify it's loaded
118+
ssh-add -l
119+
```
120+
121+
#### Step 3: Run the Docker build with SSH forwarding
122+
123+
```sh
124+
docker buildx build \
125+
--build-arg BUILD_CUSTOM_VLLM=1 \
126+
--target release \
127+
--build-context nemo-rl=. \
128+
-f docker/Dockerfile \
129+
--ssh default \
130+
--tag <registry>/nemo-rl:latest \
131+
--push \
132+
.
133+
```
134+
135+
## Running Applications with a Custom vLLM Container
136+
137+
When using a container built with custom vLLM, **use the frozen environment workflow** (bare `python`) instead of `uv run` with `NRL_FORCE_REBUILD_VENVS=true`.
138+
139+
```sh
140+
# Recommended: use bare python (frozen environment)
141+
python examples/run_grpo_math.py
142+
143+
# NOT recommended with custom vLLM containers:
144+
# uv run examples/run_grpo_math.py
145+
# or
146+
# NRL_FORCE_REBUILD_VENVS=true uv run examples/run_grpo_math.py
147+
```
148+
149+
### Why Not Use `uv run` or Rebuild Venvs?
150+
151+
Rebuilding worker virtual environments (via `uv run` or `NRL_FORCE_REBUILD_VENVS=true`) requires having the custom vLLM compiled locally. However, compiling vLLM requires a container environment with the correct CUDA toolchain—creating a chicken-and-egg problem.
152+
153+
The container already has vLLM built and cached in the frozen environments. Using bare `python` leverages these pre-built environments directly, avoiding the need to recompile vLLM at runtime.
154+
155+
> [!TIP]
156+
> For more details on frozen environments and how they differ from `uv run`, see the [Dependency Management](../design-docs/dependency-management.md#frozen-environments) documentation.

tools/build-custom-vllm.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,9 @@ echo " Vllm Wheel location: $VLLM_PRECOMPILED_WHEEL_LOCATION"
4141

4242
# Clone the repository
4343
echo "Cloning repository..."
44-
git clone "$GIT_URL" "$BUILD_DIR"
44+
# When running inside Docker with --mount=type=ssh, the known_hosts file is empty.
45+
# Skip host key verification for internal builds (only applies to SSH URLs).
46+
GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" git clone "$GIT_URL" "$BUILD_DIR"
4547
cd "$BUILD_DIR"
4648
git checkout "$GIT_REF"
4749

0 commit comments

Comments
 (0)