-
Notifications
You must be signed in to change notification settings - Fork 601
Description
Describe the bug
When running ComfyUI on RunPod using the runpod/worker-comfyui Docker image, the ComfyUI server intermittently becomes unreachable and return errors.
From the worker side I sometime get:
ComfyUI server (127.0.0.1:8188) not reachable after multiple retries.
The issue appears sporadically during normal operation: it may persist for a few minutes and then resolve on its own, only to reappear later. It is not tied to a specific workflow and has occurred across multiple template versions.
Repro MVP
On RunPod, start a pod based on the ComfyUI template, using a custom Dockerfile:
FROM runpod/worker-comfyui:5.6.0-base
# (customizations for my use case; custom nodes/workflows/scripts)
Then:
Start the pod.
Trigger ComfyUI workflows via the worker.
After some time under normal usage (multiple workflows execution over time), observe that:
Some requests fail with
ComfyUI server (127.0.0.1:8188) not reachable after multiple retries.
After a few minutes, the error may disappear without any manual intervention.
At a later point, the same issue can reappear.
This behavior reproduces on different GPU types (currently running on RTX 5090) and across multiple versions of this template (not just 5.6.0).
Expected behavior
We should be able to reliably execute ComfyUI workflows.
No recurring intermittent periods where the ComfyUI server is unreachable.
If ComfyUI crashes or restarts, it should either:
Be restarted in a controlled way, or
Surface a clearer error/log indicating why the server is not listening.
Screenshots
Versions (please complete the following information):
Docker base image: runpod/worker-comfyui:5.6.0-base
Template version (RunPod): 5.6.0 (issue also reproduced on earlier template versions)
ComfyUI version inside the image: [not sure – whatever is bundled with 5.6.0 template]
Host environment: RunPod GPU cloud
GPU types tried:
RTX 5090 (currently in use)
Other GPU types on RunPod (same issue)
Additional context
I am using the RunPod ComfyUI worker template as a base and extending it via Docker (FROM runpod/worker-comfyui:5.6.0-base).
The error:
- Appears sporadically during normal operation.
- Sometimes resolves by itself after a few minutes without restarting the pod.
- Has occurred across multiple versions of this template.
I tried:
- Changing GPU types → no improvement.
- Updating to the latest available template version → issue persists.
I would appreciate guidance on:
- How to restart or recover it gracefully from inside the container or via the worker logic.
- Any known issues or recommended configuration for long-running RunPod ComfyUI workers to avoid this intermittent “server not reachable” behavior.
