Skip to content

Latest commit

 

History

History
184 lines (144 loc) · 5.61 KB

File metadata and controls

184 lines (144 loc) · 5.61 KB

Run ComfyUI with ROCm on AMD GPU

English | 🀄 中文说明

GitHub Workflow Status

Tip
This image is using the latest ROCm 7, which is only recommended for Radeon RX 9000 series (RDNA 4) and Ryzen AI 300 series (RDNA 3.5) users.
Tip
If you are using RX 7000 series (RDNA 3) GPU, you may want to use the rocm6 image for a more stable experience.

Prequisites

Essential Environment Variables

You need to add these configuration into the command of docker run or podman run below.

GPU Architecture Configuration to add GPU Model

RDNA 4 (gfx1201)

-e HSA_OVERRIDE_GFX_VERSION=12.0.1 \

Radeon AI PRO R9700

Radeon RX 9070 XT

Radeon RX 9070

Radeon RX 9070 GRE

RDNA 4 (gfx1200)

-e HSA_OVERRIDE_GFX_VERSION=12.0.0 \

Radeon RX 9060 XT

Radeon RX 9060

RDNA 3.5 (gfx1151)

-e HSA_OVERRIDE_GFX_VERSION=11.5.1 \
-e HIP_VISIBLE_DEVICES=0 \

Ryzen AI Max+ 395

Ryzen AI Max 390

Ryzen AI Max 385

RDNA 3.5 (gfx1150)

-e HSA_OVERRIDE_GFX_VERSION=11.5.0 \
-e HIP_VISIBLE_DEVICES=0 \

Ryzen AI 9 HX 375

Ryzen AI 9 HX 370

Ryzen AI 9 365

RDNA 3 (gfx1101)

-e HSA_OVERRIDE_GFX_VERSION=11.0.1 \

Radeon PRO W7700

Radeon PRO V710

Radeon RX 7800 XT

Radeon RX 7700 XT

RDNA 3 (gfx1100)

-e HSA_OVERRIDE_GFX_VERSION=11.0.0 \

Radeon PRO W7900

Radeon PRO W7800

Radeon RX 7900 XTX

Radeon RX 7900 XT

Radeon RX 7900 GRE

RDNA 2 (gfx1030)

-e HSA_OVERRIDE_GFX_VERSION=10.3.0 \

Radeon PRO W6800

Radeon PRO V620

  • Radeon RX 6000 series may work, but there is no guarantee.

    • They are not officially supported by ROCm 7. But PRO GPUs of RDNA 2 (W6800, V620) are supported. They share the same build target gfx1030.

Reference: AMD GPU arch specs (Thanks to nhtua)

Run

Using Docker

mkdir -p \
#  storage \
  storage-models/models \
  storage-models/hf-hub \
  storage-models/torch-hub \
  storage-user/input \
  storage-user/output \
  storage-user/workflows

docker run \
  --name comfyui-rocm7 \
  --device=/dev/kfd --device=/dev/dri \
  --group-add=video --group-add=render \
  --ipc=host --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --security-opt label=disable \
  -p 8188:8188 \
#  -v "$(pwd)"/storage:/root \
  -v "$(pwd)"/storage-models/models:/root/ComfyUI/models \
  -v "$(pwd)"/storage-models/hf-hub:/root/.cache/huggingface/hub \
  -v "$(pwd)"/storage-models/torch-hub:/root/.cache/torch/hub \
  -v "$(pwd)"/storage-user/input:/root/ComfyUI/input \
  -v "$(pwd)"/storage-user/output:/root/ComfyUI/output \
  -v "$(pwd)"/storage-user/workflows:/root/ComfyUI/user/default/workflows \
  -e HSA_OVERRIDE_GFX_VERSION="" \
  -e CLI_ARGS="" \
  yanwk/comfyui-boot:rocm7

Using Podman

mkdir -p \
#  storage \
  storage-models/models \
  storage-models/hf-hub \
  storage-models/torch-hub \
  storage-user/input \
  storage-user/output \
  storage-user/workflows

podman run \
  --name comfyui-rocm7 \
  --device=/dev/kfd --device=/dev/dri \
  --group-add=video --group-add=render \
  --ipc=host --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --security-opt label=disable \
  -p 8188:8188 \
#  -v "$(pwd)"/storage:/root \
  -v "$(pwd)"/storage-models/models:/root/ComfyUI/models \
  -v "$(pwd)"/storage-models/hf-hub:/root/.cache/huggingface/hub \
  -v "$(pwd)"/storage-models/torch-hub:/root/.cache/torch/hub \
  -v "$(pwd)"/storage-user/input:/root/ComfyUI/input \
  -v "$(pwd)"/storage-user/output:/root/ComfyUI/output \
  -v "$(pwd)"/storage-user/workflows:/root/ComfyUI/user/default/workflows \
  -e HSA_OVERRIDE_GFX_VERSION="" \
  -e CLI_ARGS="" \
  docker.io/yanwk/comfyui-boot:rocm7

Once the app is loaded, visit http://localhost:8188/

Optional Environment Variables

You may also want to add more environment variables:

  • Force ComfyUI to offload model weights from VRAM to RAM more frequently. Slows performance but reduce memory leaks (Source).

    • -e CLI_ARGS="--disable-smart-memory" \

  • Disable internal memory fragment caching (to mitigate memory faults. Doc). (Thanks to SergeyFilippov)

    • -e HSA_DISABLE_FRAGMENT_ALLOCATOR=1 \

  • Enable tunable operations (slower first run, faster subsequent runs. Doc1, Doc2). (Thanks to SergeyFilippov)

    • -e PYTORCH_TUNABLEOP_ENABLED=1 \