Skip to content

[BUG] Sporadic errors - "ComfyUI server (127.0.0.1:8188) not reachable after multiple retries." #186

@HaimBendanan

Description

@HaimBendanan

Describe the bug

When running ComfyUI on RunPod using the runpod/worker-comfyui Docker image, the ComfyUI server intermittently becomes unreachable and return errors.

From the worker side I sometime get:

ComfyUI server (127.0.0.1:8188) not reachable after multiple retries.

The issue appears sporadically during normal operation: it may persist for a few minutes and then resolve on its own, only to reappear later. It is not tied to a specific workflow and has occurred across multiple template versions.

Repro MVP

On RunPod, start a pod based on the ComfyUI template, using a custom Dockerfile:

FROM runpod/worker-comfyui:5.6.0-base
# (customizations for my use case; custom nodes/workflows/scripts)

Then:

Start the pod.

Trigger ComfyUI workflows via the worker.

After some time under normal usage (multiple workflows execution over time), observe that:

Some requests fail with

ComfyUI server (127.0.0.1:8188) not reachable after multiple retries.

After a few minutes, the error may disappear without any manual intervention.

At a later point, the same issue can reappear.

This behavior reproduces on different GPU types (currently running on RTX 5090) and across multiple versions of this template (not just 5.6.0).

Expected behavior

We should be able to reliably execute ComfyUI workflows.

No recurring intermittent periods where the ComfyUI server is unreachable.

If ComfyUI crashes or restarts, it should either:

Be restarted in a controlled way, or

Surface a clearer error/log indicating why the server is not listening.

Screenshots

See errors in metrics:
Image

Versions (please complete the following information):

Docker base image: runpod/worker-comfyui:5.6.0-base

Template version (RunPod): 5.6.0 (issue also reproduced on earlier template versions)

ComfyUI version inside the image: [not sure – whatever is bundled with 5.6.0 template]

Host environment: RunPod GPU cloud

GPU types tried:

RTX 5090 (currently in use)

Other GPU types on RunPod (same issue)

Additional context

I am using the RunPod ComfyUI worker template as a base and extending it via Docker (FROM runpod/worker-comfyui:5.6.0-base).

The error:

  • Appears sporadically during normal operation.
  • Sometimes resolves by itself after a few minutes without restarting the pod.
  • Has occurred across multiple versions of this template.

I tried:

  • Changing GPU types → no improvement.
  • Updating to the latest available template version → issue persists.

I would appreciate guidance on:

  • How to restart or recover it gracefully from inside the container or via the worker logic.
  • Any known issues or recommended configuration for long-running RunPod ComfyUI workers to avoid this intermittent “server not reachable” behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions