[BUG] Sporadic errors - "ComfyUI server (127.0.0.1:8188) not reachable after multiple retries."

## Describe the bug

When running ComfyUI on RunPod using the runpod/worker-comfyui Docker image, the ComfyUI server intermittently becomes unreachable and return errors.

From the worker side I sometime get:

```
ComfyUI server (127.0.0.1:8188) not reachable after multiple retries.
```

The issue appears sporadically during normal operation: it may persist for a few minutes and then resolve on its own, only to reappear later. It is not tied to a specific workflow and has occurred across multiple template versions.


## Repro MVP

On RunPod, start a pod based on the ComfyUI template, using a custom Dockerfile:

```
FROM runpod/worker-comfyui:5.6.0-base
# (customizations for my use case; custom nodes/workflows/scripts)
```

Then:

Start the pod.

Trigger ComfyUI workflows via the worker.

After some time under normal usage (multiple workflows execution over time), observe that:

Some requests fail with
```
ComfyUI server (127.0.0.1:8188) not reachable after multiple retries.
```

After a few minutes, the error may disappear without any manual intervention.

At a later point, the same issue can reappear.

This behavior reproduces on different GPU types (currently running on RTX 5090) and across multiple versions of this template (not just 5.6.0).

## Expected behavior

We should be able to reliably execute ComfyUI workflows.

No recurring intermittent periods where the ComfyUI server is unreachable.

If ComfyUI crashes or restarts, it should either:

Be restarted in a controlled way, or

Surface a clearer error/log indicating why the server is not listening.

## Screenshots

See errors in metrics:
<img width="970" height="481" alt="Image" src="https://github.com/user-attachments/assets/54b5bb5b-d824-4576-a9db-213d3978546e" />



## Versions (please complete the following information):

Docker base image: runpod/worker-comfyui:5.6.0-base

Template version (RunPod): 5.6.0 (issue also reproduced on earlier template versions)

ComfyUI version inside the image: [not sure – whatever is bundled with 5.6.0 template]

Host environment: RunPod GPU cloud

GPU types tried:

RTX 5090 (currently in use)

Other GPU types on RunPod (same issue)

## Additional context

I am using the RunPod ComfyUI worker template as a base and extending it via Docker (FROM runpod/worker-comfyui:5.6.0-base).

The error:

- Appears sporadically during normal operation.
- Sometimes resolves by itself after a few minutes without restarting the pod.
- Has occurred across multiple versions of this template.

I tried:

- Changing GPU types → no improvement.
- Updating to the latest available template version → issue persists.

I would appreciate guidance on:
- How to restart or recover it gracefully from inside the container or via the worker logic.
- Any known issues or recommended configuration for long-running RunPod ComfyUI workers to avoid this intermittent “server not reachable” behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Sporadic errors - "ComfyUI server (127.0.0.1:8188) not reachable after multiple retries." #186

Describe the bug

Repro MVP

Expected behavior

Screenshots

Versions (please complete the following information):

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Sporadic errors - "ComfyUI server (127.0.0.1:8188) not reachable after multiple retries." #186

Description

Describe the bug

Repro MVP

Expected behavior

Screenshots

Versions (please complete the following information):

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions