Skip to content

Commit 84f7f8d

Browse files
authored
[deployment] feat: support build docker image with aarch64 platform (verl-project#4605)
### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. The PR is enabling the verlai/verl:vllm012.exp and verlai/verl:sgl056.exp build on aarch64 platform like GB200. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+GB200 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
1 parent 0af6a38 commit 84f7f8d

File tree

4 files changed

+28
-27
lines changed

4 files changed

+28
-27
lines changed

docker/Dockerfile.stable.sglang

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ RUN pip install --upgrade --no-cache-dir transformers tokenizers
1414

1515
RUN pip install codetiming tensordict mathruler pylatexenc qwen_vl_utils
1616

17-
RUN pip install --no-cache-dir --no-build-isolation --no-binary flash_attn==2.8.1
17+
RUN pip install --no-cache-dir --no-build-isolation flash_attn==2.8.1
1818

19-
RUN wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
20-
apt-get update && apt-get install -y libxcb-cursor0
21-
22-
RUN apt-get install -y ./nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
19+
RUN NSIGHT_VERSION=2025.6.1_2025.6.1.190-1_$(if [ "$(uname -m)" = "aarch64" ]; then echo "arm64"; else echo "amd64"; fi) && \
20+
wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-${NSIGHT_VERSION}.deb && \
21+
apt-get update && apt-get install -y libxcb-cursor0 && \
22+
apt-get install -y ./nsight-systems-${NSIGHT_VERSION}.deb && \
2323
rm -rf /usr/local/cuda/bin/nsys && \
24-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
24+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys /usr/local/cuda/bin/nsys && \
2525
rm -rf /usr/local/cuda/bin/nsys-ui && \
26-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
27-
rm nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb
26+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys-ui /usr/local/cuda/bin/nsys-ui && \
27+
rm nsight-systems-${NSIGHT_VERSION}.deb
2828

2929
# sglang image has already installed DeepEP
3030

docker/Dockerfile.stable.vllm

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,15 @@ RUN pip install codetiming tensordict mathruler pylatexenc qwen_vl_utils
5050

5151
RUN pip install flash_attn==2.8.1 --no-build-isolation
5252

53-
RUN wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
53+
RUN NSIGHT_VERSION=2025.6.1_2025.6.1.190-1_$(if [ "$(uname -m)" = "aarch64" ]; then echo "arm64"; else echo "amd64"; fi) && \
54+
wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-${NSIGHT_VERSION}.deb && \
5455
apt-get update && apt-get install -y libxcb-cursor0 && \
55-
apt-get install -y ./nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
56+
apt-get install -y ./nsight-systems-${NSIGHT_VERSION}.deb && \
5657
rm -rf /usr/local/cuda/bin/nsys && \
57-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
58+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys /usr/local/cuda/bin/nsys && \
5859
rm -rf /usr/local/cuda/bin/nsys-ui && \
59-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
60-
rm nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
60+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys-ui /usr/local/cuda/bin/nsys-ui && \
61+
rm nsight-systems-${NSIGHT_VERSION}.deb && \
6162
rm -rf /var/lib/apt/lists/*
6263

6364
# =========================

docker/verl0.6.1-experimental/Dockerfile.sglang056exp

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,15 @@ RUN pip install codetiming tensordict mathruler pylatexenc qwen_vl_utils
1515

1616
RUN pip install --no-cache-dir --no-build-isolation flash_attn==2.8.1
1717

18-
RUN wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
19-
apt-get update && apt-get install -y libxcb-cursor0
20-
21-
RUN apt-get install -y ./nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
18+
RUN NSIGHT_VERSION=2025.6.1_2025.6.1.190-1_$(if [ "$(uname -m)" = "aarch64" ]; then echo "arm64"; else echo "amd64"; fi) && \
19+
wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-${NSIGHT_VERSION}.deb && \
20+
apt-get update && apt-get install -y libxcb-cursor0 && \
21+
apt-get install -y ./nsight-systems-${NSIGHT_VERSION}.deb && \
2222
rm -rf /usr/local/cuda/bin/nsys && \
23-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
23+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys /usr/local/cuda/bin/nsys && \
2424
rm -rf /usr/local/cuda/bin/nsys-ui && \
25-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
26-
rm nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb
25+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys-ui /usr/local/cuda/bin/nsys-ui && \
26+
rm nsight-systems-${NSIGHT_VERSION}.deb
2727

2828

2929
# =========================

docker/verl0.6.1-experimental/Dockerfile.vllm012exp

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,15 +23,15 @@ RUN pip install flash_attn
2323

2424
RUN apt update && apt install numactl
2525

26-
RUN wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
27-
apt-get update && apt-get install -y libxcb-cursor0
28-
29-
RUN apt-get install -y ./nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb && \
26+
RUN NSIGHT_VERSION=2025.6.1_2025.6.1.190-1_$(if [ "$(uname -m)" = "aarch64" ]; then echo "arm64"; else echo "amd64"; fi) && \
27+
wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_6/nsight-systems-${NSIGHT_VERSION}.deb && \
28+
apt-get update && apt-get install -y libxcb-cursor0 && \
29+
apt-get install -y ./nsight-systems-${NSIGHT_VERSION}.deb && \
3030
rm -rf /usr/local/cuda/bin/nsys && \
31-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys /usr/local/cuda/bin/nsys && \
31+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys /usr/local/cuda/bin/nsys && \
3232
rm -rf /usr/local/cuda/bin/nsys-ui && \
33-
ln -s /opt/nvidia/nsight-systems/2025.6.1/target-linux-x64/nsys-ui /usr/local/cuda/bin/nsys-ui && \
34-
rm nsight-systems-2025.6.1_2025.6.1.190-1_amd64.deb
33+
ln -s /opt/nvidia/nsight-systems/2025.6.1/nsys-ui /usr/local/cuda/bin/nsys-ui && \
34+
rm nsight-systems-${NSIGHT_VERSION}.deb
3535

3636

3737
# =========================

0 commit comments

Comments
 (0)