Skip to content

Commit 8291491

Browse files
jamesbrinkpyqlsa
andauthored
feat: add AMD ROCm GPU support (#30)
* feat: initial amd gpu support with rocm 7.1 * formatting * fix: resolve bugs and clean up ROCm support from PR #28 Cherry-picked pyqlsa's ROCm implementation (commits 7e6f796, e2343fe) and applied fixes on top: - Fix ROCm app incorrectly named "cuda" in nix/apps.nix (would shadow CUDA) - Fix mkPython passing boolean `false` instead of string `"none"` for gpuSupport - Fix flake description still referencing v0.12.2 instead of v0.14.2 - Fix README Podman ROCm example referencing latest-cuda instead of latest-rocm - Fix ROCm torchaudio missing FFmpeg/sox ignore deps (matching CUDA pattern) - Resolve all TODO/XXX review comments left by pyqlsa - Update CHANGELOG, CLAUDE.md, and README with ROCm documentation - Fix CHANGELOG footer links (missing v0.14.2 link, Unreleased pointing to v0.12.2) Closes #27 Co-authored-by: pyqlsa <26353308+pyqlsa@users.noreply.github.com> --------- Co-authored-by: pyqlsa <26353308+pyqlsa@users.noreply.github.com>
1 parent be15424 commit 8291491

File tree

11 files changed

+521
-82
lines changed

11 files changed

+521
-82
lines changed

.github/workflows/build.yml

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,13 @@ jobs:
105105
platform: linux/amd64
106106
tag_suffix: '-cuda'
107107
runner: ubuntu-latest
108+
- variant: rocm
109+
arch: x86_64-linux
110+
nix_target: dockerImageRocm
111+
package_target: rocm
112+
platform: linux/amd64
113+
tag_suffix: '-rocm'
114+
runner: ubuntu-latest
108115

109116
steps:
110117
- name: Checkout repository
@@ -160,8 +167,8 @@ jobs:
160167
sudo rm -rf /var/lib/apt/lists/*
161168
sudo apt-get clean
162169
docker system prune -af || true
163-
# Extra cleanup for CUDA builds (needs maximum disk space)
164-
if [[ "${{ matrix.variant }}" == "cuda" ]]; then
170+
# Extra cleanup for CUDA/ROCm builds (needs maximum disk space)
171+
if [[ "${{ matrix.variant }}" =~ ^cuda$|^rocm$ ]]; then
165172
sudo rm -rf /usr/local/share/powershell
166173
sudo rm -rf /usr/local/share/chromium
167174
sudo rm -rf /usr/local/share/vcpkg
@@ -217,8 +224,8 @@ jobs:
217224
- name: Build Docker image with Nix
218225
run: |
219226
echo "Building ${{ matrix.variant }} variant for ${{ matrix.arch }}..."
220-
# For CUDA builds, limit parallelism to reduce memory pressure
221-
if [[ "${{ matrix.variant }}" == "cuda" ]]; then
227+
# For CUDA/ROCm builds, limit parallelism to reduce memory pressure
228+
if [[ "${{ matrix.variant }}" =~ ^cuda$|^rocm$ ]]; then
222229
nix build .#packages.${{ matrix.arch }}.${{ matrix.nix_target }} --print-build-logs --max-jobs 1 --cores 2
223230
else
224231
nix build .#packages.${{ matrix.arch }}.${{ matrix.nix_target }} --print-build-logs
@@ -264,8 +271,8 @@ jobs:
264271
# Build the actual package (not Docker image) for caching
265272
nix build .#packages.${{ matrix.arch }}.${{ matrix.package_target }} -o result-pkg --print-build-logs
266273
267-
# For CUDA builds, include build-time dependencies (magma, triton, etc.)
268-
if [[ "${{ matrix.variant }}" == "cuda" ]]; then
274+
# For CUDA/ROCm builds, include build-time dependencies (magma, triton, etc.)
275+
if [[ "${{ matrix.variant }}" =~ ^cuda$|^rocm$ ]]; then
269276
./scripts/push-to-cachix.sh --build-deps ./result-pkg
270277
else
271278
./scripts/push-to-cachix.sh ./result-pkg
@@ -367,6 +374,31 @@ jobs:
367374
docker buildx imagetools create --tag "$CUDA_TAG" "$CUDA_AMD64_TAG"
368375
docker buildx imagetools create --tag "$VERSION_CUDA_TAG" "$CUDA_AMD64_TAG"
369376
377+
- name: Create and push ROCm manifest (x86_64 only)
378+
run: |
379+
if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
380+
ROCM_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-rocm"
381+
VERSION_ROCM_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.meta.outputs.version }}-rocm"
382+
ROCM_AMD64_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-rocm-amd64"
383+
elif [[ "${{ github.ref }}" == refs/tags/v* ]]; then
384+
TAG_VERSION=${GITHUB_REF#refs/tags/v}
385+
ROCM_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${TAG_VERSION}-rocm"
386+
VERSION_ROCM_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-rocm"
387+
ROCM_AMD64_TAG="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${TAG_VERSION}-rocm-amd64"
388+
fi
389+
390+
echo "Verifying ROCm image exists..."
391+
if ! docker manifest inspect "$ROCM_AMD64_TAG" &>/dev/null; then
392+
echo "ERROR: Required ROCm image not found: $ROCM_AMD64_TAG"
393+
exit 1
394+
fi
395+
echo "Found: $ROCM_AMD64_TAG"
396+
397+
# Use buildx imagetools to create tags without pulling the full image
398+
echo "Creating ROCm tags via registry (no local pull required)..."
399+
docker buildx imagetools create --tag "$ROCM_TAG" "$ROCM_AMD64_TAG"
400+
docker buildx imagetools create --tag "$VERSION_ROCM_TAG" "$ROCM_AMD64_TAG"
401+
370402
- name: Generate manifest summary
371403
run: |
372404
echo "## Multi-Architecture Docker Images Published" >> $GITHUB_STEP_SUMMARY
@@ -382,6 +414,10 @@ jobs:
382414
echo '```bash' >> $GITHUB_STEP_SUMMARY
383415
echo "docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-cuda" >> $GITHUB_STEP_SUMMARY
384416
echo '```' >> $GITHUB_STEP_SUMMARY
417+
echo "### ROCM (amd64 only)" >> $GITHUB_STEP_SUMMARY
418+
echo '```bash' >> $GITHUB_STEP_SUMMARY
419+
echo "docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-rocm" >> $GITHUB_STEP_SUMMARY
420+
echo '```' >> $GITHUB_STEP_SUMMARY
385421
386422
update-description:
387423
needs: create-manifests

CHANGELOG.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
## [0.14.2] - 2026-02-19
1111

12+
### Added
13+
- AMD ROCm GPU support via pre-built PyTorch wheels (ROCm 7.1, tested on gfx1100/7900 XTX) (#27)
14+
- `nix run .#rocm` app, `dockerImageRocm`, and NixOS module `gpuSupport = "rocm"` option
15+
- ROCm Docker images and CI pipeline (`ghcr.io/utensils/comfyui-nix:latest-rocm`)
16+
- ROCm dev shell (`nix develop .#rocm`)
17+
1218
### Changed
1319
- Upgraded ComfyUI from v0.12.2 to v0.14.2 (5 upstream releases)
20+
- Replaced `cudaSupport` boolean with `gpuSupport` enum (`"cuda"`, `"rocm"`, `"none"`) across flake, packages, and NixOS module
1421
- Updated `comfyui-frontend-package` 1.37.11 → 1.38.14
1522
- Updated `comfyui-workflow-templates` 0.8.31 → 0.8.43
1623
- Updated `comfyui-workflow-templates-core` 0.3.124 → 0.3.147
@@ -36,6 +43,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3643
- More efficient rope implementation for LLaMA, torch RMSNorm for Flux models
3744
- New API nodes: Magnific Upscalers, Bria RMBG, Recraft V4, Vidu Q3 Turbo, Kling V3/O3, Tencent 3D
3845

46+
### Fixed
47+
- Pin template input URLs to commit SHA instead of mutable `refs/heads/main` branch ref, preventing `hash mismatch in fixed-output derivation` errors when upstream changes files (#25)
48+
3949
## [0.12.2] - 2026-02-07
4050

4151
### Changed
@@ -176,7 +186,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
176186
- Apple Silicon (M-series) support
177187
- Basic persistence for user data
178188

179-
[Unreleased]: https://github.com/utensils/comfyui-nix/compare/v0.12.2...HEAD
189+
[Unreleased]: https://github.com/utensils/comfyui-nix/compare/v0.14.2...HEAD
190+
[0.14.2]: https://github.com/utensils/comfyui-nix/compare/v0.12.2...v0.14.2
180191
[0.12.2]: https://github.com/utensils/comfyui-nix/compare/v0.7.0-2...v0.12.2
181192
[0.7.0-2]: https://github.com/utensils/comfyui-nix/compare/v0.7.0-1...v0.7.0-2
182193
[0.7.0-1]: https://github.com/utensils/comfyui-nix/compare/v0.7.0...v0.7.0-1

CLAUDE.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
1212
- **Run application**: `nix run` (default)
1313
- **Run with browser**: `nix run -- --open` (automatically opens browser)
1414
- **Run with CUDA**: `nix run .#cuda` (Linux/NVIDIA only, uses pre-built PyTorch CUDA wheels)
15+
- **Run with ROCm**: `nix run .#rocm` (Linux/AMD only, uses pre-built PyTorch ROCm 7.1 wheels)
1516
- **Run with custom port**: `nix run -- --port=8080`
1617
- **Run with network access**: `nix run -- --listen 0.0.0.0`
1718
- **Run with debug logging**: `nix run -- --debug` or `nix run -- --verbose`
@@ -23,10 +24,12 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
2324

2425
- **Build Docker image**: `nix run .#buildDocker` (creates `comfy-ui:latest`)
2526
- **Build CUDA Docker**: `nix run .#buildDockerCuda` (creates `comfy-ui:cuda`)
26-
- **Cross-build Linux images from macOS**: `nix run .#buildDockerLinux`, `nix run .#buildDockerLinuxCuda`, `nix run .#buildDockerLinuxArm64`
27-
- **Pull pre-built**: `docker pull ghcr.io/utensils/comfyui-nix:latest` (or `:latest-cuda`)
27+
- **Build ROCm Docker**: `nix run .#buildDockerRocm` (creates `comfy-ui:rocm`)
28+
- **Cross-build Linux images from macOS**: `nix run .#buildDockerLinux`, `nix run .#buildDockerLinuxCuda`, `nix run .#buildDockerLinuxRocm`, `nix run .#buildDockerLinuxArm64`
29+
- **Pull pre-built**: `docker pull ghcr.io/utensils/comfyui-nix:latest` (or `:latest-cuda`, `:latest-rocm`)
2830
- **Run container**: `docker run -p 8188:8188 -v $PWD/data:/data comfy-ui:latest`
2931
- **Run CUDA container**: `docker run --gpus all -p 8188:8188 -v $PWD/data:/data comfy-ui:cuda`
32+
- **Run ROCm container**: `docker run --device /dev/kfd --device /dev/dri -p 8188:8188 -v $PWD/data:/data comfy-ui:rocm`
3033

3134
## Linting and Code Quality
3235

@@ -46,13 +49,13 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
4649

4750
## Version Management
4851

49-
- Current ComfyUI version: v0.12.2 (pinned in `nix/versions.nix`)
52+
- Current ComfyUI version: v0.14.2 (pinned in `nix/versions.nix`)
5053
- To update ComfyUI: modify `version`, `rev`, and `hash` in `nix/versions.nix`
5154
- Vendored wheels (spandrel, frontend, docs, etc.) also pinned in `nix/versions.nix`
5255
- Template input files: auto-generated in `nix/template-inputs.nix`
5356
- Update with: `./scripts/update-template-inputs.sh && git add nix/template-inputs.nix`
5457
- Python version: 3.12
55-
- PyTorch: macOS uses pre-built wheels (2.5.1, pinned to work around MPS bugs on macOS 26); CUDA uses pre-built wheels from pytorch.org (cu124); Linux CPU uses nixpkgs
58+
- PyTorch: macOS uses pre-built wheels (2.5.1, pinned to work around MPS bugs on macOS 26); CUDA uses pre-built wheels from pytorch.org (cu124); ROCm uses pre-built wheels from pytorch.org (rocm7.1); Linux CPU uses nixpkgs
5659

5760
## Project Architecture
5861

@@ -127,6 +130,7 @@ fonts/ - Bundled fonts for nodes requiring system fonts
127130

128131
- macOS: PyTorch pinned to 2.5.1 to work around MPS bugs on macOS 26 (Tahoe); browser opens via `/usr/bin/open`
129132
- CUDA: Pre-built wheels from pytorch.org with CUDA 12.4 runtime bundled (no separate toolkit needed); supports Pascal through Hopper
133+
- ROCm: Pre-built wheels from pytorch.org with ROCm 7.1 runtime bundled; tested on gfx1100 (7900 XTX); `/run/opengl-driver/lib` provides AMD drivers on NixOS
130134
- Linux CPU: Uses nixpkgs PyTorch; browser opens via `xdg-open`
131135
- Cross-platform Docker builds work from any system via `nix run .#buildDockerLinux` etc.
132136

@@ -139,6 +143,7 @@ fonts/ - Bundled fonts for nodes requiring system fonts
139143
- Triggers: push to main, version tags (v*), pull requests
140144
- CPU images: multi-arch (amd64 + arm64 via QEMU)
141145
- CUDA images: x86_64 only
146+
- ROCm images: x86_64 only
142147
- Published to `ghcr.io/utensils/comfyui-nix` (`:latest`, `:latest-cuda`, `:X.Y.Z`)
143148

144149
**Claude Code Integration** (`.github/workflows/claude.yml`, `.github/workflows/claude-code-review.yml`):

README.md

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,14 @@ nix run github:utensils/comfyui-nix#cuda
2626

2727
CUDA builds use pre-built PyTorch wheels from pytorch.org, so builds are fast (~2GB download) and support all GPU architectures from Pascal (GTX 1080) through Hopper (H100) in a single package.
2828

29+
For ROCm (Linux/AMD):
30+
31+
```bash
32+
nix run github:utensils/comfyui-nix#rocm
33+
```
34+
35+
ROCm builds use pre-built PyTorch wheels from pytorch.org, so builds are fast (~2GB download); so far, GPU support has only been validated on `gfx1100`.
36+
2937
## Options
3038

3139
All [ComfyUI CLI options] are supported. Common examples:
@@ -62,6 +70,21 @@ nix run github:utensils/comfyui-nix#cuda
6270

6371
This single package works on any NVIDIA GPU from the past ~8 years.
6472

73+
## ROCm GPU Support
74+
75+
ROCm builds are available for Linux with AMD GPUs. The `#rocm` package uses pre-built PyTorch wheels from pytorch.org which:
76+
77+
- **Fast builds**: Downloads ~2GB of pre-built wheels instead of compiling for hours
78+
- **Low memory**: No 30-60GB RAM requirement for compilation
79+
- **Supported architectures**: To date, only `gfx1100` (7900XTX) has been tested
80+
- **Bundled runtime**: ROCm 7.1 libraries included in wheels (no separate toolkit needed)
81+
82+
```bash
83+
nix run github:utensils/comfyui-nix#rocm
84+
```
85+
86+
> ROCm support contributed by [@pyqlsa](https://github.com/pyqlsa) — thank you!
87+
6588
## Why a Nix Flake?
6689

6790
ComfyUI's standard installation relies on pip and manual dependency management, which doesn't integrate well with NixOS's declarative approach. This flake provides:
@@ -241,6 +264,7 @@ Add ComfyUI as a package in your system configuration:
241264
nixpkgs.overlays = [ comfyui-nix.overlays.default ];
242265
environment.systemPackages = [ pkgs.comfy-ui ];
243266
# Or for CUDA: pkgs.comfy-ui-cuda
267+
# Or for ROCm: pkgs.comfy-ui-rocm
244268
}];
245269
};
246270
@@ -263,6 +287,7 @@ Add ComfyUI as a package in your system configuration:
263287
environment.systemPackages = [
264288
inputs.comfyui-nix.packages.${pkgs.system}.default # CPU
265289
# inputs.comfyui-nix.packages.${pkgs.system}.cuda # CUDA (Linux)
290+
# inputs.comfyui-nix.packages.${pkgs.system}.rocm # ROCm (Linux)
266291
];
267292
}
268293
```
@@ -275,6 +300,7 @@ The overlay provides these packages:
275300
| -------------------- | ------------------------------------------------ |
276301
| `pkgs.comfy-ui` | CPU + Apple Silicon (Metal) - use this for macOS |
277302
| `pkgs.comfy-ui-cuda` | NVIDIA GPUs (Linux only, all architectures) |
303+
| `pkgs.comfy-ui-rocm` | AMD GPUs (Linux only, `gfx1100`) |
278304

279305
> **Note:** On macOS with Apple Silicon, the base `comfy-ui` package automatically uses Metal for GPU acceleration. No separate CUDA package is needed.
280306
@@ -288,6 +314,9 @@ nix profile add github:utensils/comfyui-nix
288314

289315
# CUDA (Linux/NVIDIA only)
290316
nix profile add github:utensils/comfyui-nix#cuda
317+
318+
# ROCm (Linux/AMD only)
319+
nix profile add github:utensils/comfyui-nix#rocm
291320
```
292321

293322
> **Note:** Profile installation is convenient for trying ComfyUI but isn't declarative. For reproducible setups, add the package to your NixOS/nix-darwin configuration instead.
@@ -301,7 +330,8 @@ nix profile add github:utensils/comfyui-nix#cuda
301330
302331
services.comfyui = {
303332
enable = true;
304-
cuda = true; # Enable NVIDIA GPU acceleration (recommended for most users)
333+
gpuSupport = "cuda"; # Enable NVIDIA GPU acceleration (recommended for most users)
334+
# gpuSupport = "rocm"; # Enable AMD GPU acceleration
305335
# cudaCapabilities = [ "8.9" ]; # Optional: optimize system CUDA packages for RTX 40xx
306336
# Note: Pre-built PyTorch wheels already support all GPU architectures
307337
enableManager = true; # Enable the built-in ComfyUI Manager
@@ -320,7 +350,7 @@ nix profile add github:utensils/comfyui-nix#cuda
320350
| Option | Default | Description |
321351
| --------------- | -------------------- | ------------------------------------------------ |
322352
| `enable` | `false` | Enable the ComfyUI service |
323-
| `cuda` | `false` | Enable NVIDIA GPU acceleration |
353+
| `gpuSupport` | `"none"` | Enable NVIDIA or AMD GPU acceleration |
324354
| `cudaCapabilities` | `null` | Optional CUDA compute capability list |
325355
| `enableManager` | `false` | Enable the built-in ComfyUI Manager |
326356
| `port` | `8188` | Port for the web interface |
@@ -347,7 +377,7 @@ To run ComfyUI with data in a user's home directory:
347377
```nix
348378
services.comfyui = {
349379
enable = true;
350-
cuda = true;
380+
gpuSupport = "cuda";
351381
dataDir = "/home/myuser/comfyui-data";
352382
user = "myuser";
353383
group = "users";
@@ -393,6 +423,10 @@ docker run -p 8188:8188 -v "$PWD/data:/data" ghcr.io/utensils/comfyui-nix:latest
393423
# CUDA (x86_64 only, requires nvidia-container-toolkit)
394424
# Supports ALL GPU architectures: Pascal, Volta, Turing, Ampere, Ada, Hopper
395425
docker run --gpus all -p 8188:8188 -v "$PWD/data:/data" ghcr.io/utensils/comfyui-nix:latest-cuda
426+
427+
# ROCm (x86_64 only)
428+
# Supports `gfx1100` (and possibly others)
429+
docker run --gpus all -p 8188:8188 -v "$PWD/data:/data" ghcr.io/utensils/comfyui-nix:latest-rocm
396430
```
397431

398432
**Podman:**
@@ -403,6 +437,10 @@ podman run -p 8188:8188 -v "$PWD/data:/data:Z" ghcr.io/utensils/comfyui-nix:late
403437

404438
# CUDA (requires nvidia-container-toolkit and CDI configured)
405439
podman run --device nvidia.com/gpu=all -p 8188:8188 -v "$PWD/data:/data:Z" ghcr.io/utensils/comfyui-nix:latest-cuda
440+
441+
# ROCm
442+
podman run --device /dev/kfd --device /dev/dri -p 8188:8188 -v "$PWD/data:/data:rw" -v "/etc/passwd:/etc/passwd:ro" ghcr.io/utensils/comfyui-nix:latest-rocm
443+
406444
```
407445

408446
**Passing additional arguments:**
@@ -417,19 +455,42 @@ docker run --gpus all -p 8188:8188 -v "$PWD/data:/data" \
417455
# Podman with manager enabled
418456
podman run --device nvidia.com/gpu=all -p 8188:8188 -v "$PWD/data:/data:Z" \
419457
ghcr.io/utensils/comfyui-nix:latest-cuda --listen 0.0.0.0 --enable-manager
458+
459+
# Podman and ROCm with manager enabled and some recommended settings
460+
podman run \
461+
--device /dev/kfd \
462+
--device /dev/dri \
463+
-p 8188:8188 \
464+
-v "$PWD/data:/data:rw" \
465+
-v "/etc/passwd:/etc/passwd:ro" \
466+
ghcr.io/utensils/comfyui-nix:latest-rocm \
467+
--listen 0.0.0.0 \
468+
--enable-manager \
469+
--disable-xformers \
470+
--use-pytorch-cross-attention
420471
```
421472

422473
**Build locally:**
423474

424475
```bash
425-
nix run .#buildDocker # CPU
426-
nix run .#buildDockerCuda # CUDA
476+
nix build .#dockerImage # CPU
477+
nix build .#dockerImageCuda # CUDA
478+
nix build .#dockerImageRocm # ROCm
427479

428480
# Load into Docker/Podman
429481
docker load < result
430482
podman load < result
431483
```
432484

485+
**Build locally and load in a single step:**
486+
487+
```bash
488+
# builds image and loads into docker
489+
nix run .#buildDocker # CPU
490+
nix run .#buildDockerCuda # CUDA
491+
nix run .#buildDockerRocm # ROCm
492+
```
493+
433494
**Note:** Docker/Podman on macOS runs CPU-only. For GPU acceleration on Apple Silicon, use `nix run` directly.
434495

435496
## Development

0 commit comments

Comments
 (0)