[core] Handle progress bar and logging in distributed environments #12806

sayakpaul · 2025-12-08T13:40:07Z

What does this PR do?

We should handle logging (including the progress bar stuff) gracefully when operating under distributed setups.

Before this PR:

Loading pipeline components...:   0%|                                                                                | 0/7 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...:  29%|████████████████████▌                                                   | 2/7 [00:00<00:00,  6.76it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 59.58it/s]
Loading pipeline components...:  57%|█████████████████████████████████████████▏                              | 4/7 [00:00<00:00,  9.45it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 46.15it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 60.59it/s]
Loading pipeline components...:  86%|█████████████████████████████████████████████████████████████▋          | 6/7 [00:00<00:00, 10.19it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 48.16it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  8.95it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  8.48it/s]
Attention backends are an experimental feature and the API may be subject to change.
`enable_parallelism` is an experimental feature. The API may change in the future and breaking changes may be introduced at any time without warning.
Attention backends are an experimental feature and the API may be subject to change.
`enable_parallelism` is an experimental feature. The API may change in the future and breaking changes may be introduced at any time without warning.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  6.64it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  6.63it/s]

With this PR:

Loading pipeline components...:   0%|                                                                                | 0/7 [00:00<?, ?it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...:  14%|██████████▎                                                             | 1/7 [00:00<00:02,  2.34it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading pipeline components...:  43%|██████████████████████████████▊                                         | 3/7 [00:00<00:00,  5.90it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 61.48it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 62.15it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 47.45it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  8.62it/s]
Attention backends are an experimental feature and the API may be subject to change.
`enable_parallelism` is an experimental feature. The API may change in the future and breaking changes may be introduced at any time without warning.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:07<00:00,  6.64it/s]

Notice that there's still:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 61.48it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 62.15it/s]

It's coming from T5. We cannot control the T5 logging as it comes from transformers.

Test Code

import torch
from torch import distributed as dist
from diffusers import DiffusionPipeline, ContextParallelConfig

def setup_distributed():
    if not dist.is_initialized():
        dist.init_process_group(backend="nccl")
    device = torch.device(f"cuda:{dist.get_rank()}")
    torch.cuda.set_device(device)
    return device

device = setup_distributed()

ulysses_degree = torch.distributed.get_world_size()
pipeline = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",  torch_dtype=torch.bfloat16,
).to(device)
pipeline.transformer.set_attention_backend("_native_cudnn")
pipeline.transformer.enable_parallelism(
    config=ContextParallelConfig(ulysses_degree=ulysses_degree)
)

prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""

generator = torch.Generator().manual_seed(42)
image = pipeline(prompt, guidance_scale=3.5, num_inference_steps=50, generator=generator).images[0]

if dist.get_rank() == 0:
    image.save("output_ulysses.png")
if dist.is_initialized():
    dist.destroy_process_group()

src/diffusers/utils/logging.py

HuggingFaceDocBuilderDev · 2025-12-08T13:48:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2025-12-09T03:07:41Z

src/diffusers/utils/distributed_utils.py

+try:
+    import torch
+except ImportError:
+    torch = None


To not introduce circular import problem.

dg845 · 2025-12-10T02:06:05Z

src/diffusers/utils/distributed_utils.py

+    torch = None
+
+
+def is_torch_dist_rank_zero() -> bool:


Is this robust to different distributed setups such as accelerate, pure deepspeed, etc.? As a concrete example, I'm thinking about the case where diffusers models are used in distributed training and whether these changes would work as expected in that case.

I don't think we can control external dependencies much i.e., the logs from transformers, for example.

dg845

LGTM! Left some questions.

dg845

Thanks!

sayakpaul added 4 commits December 8, 2025 18:13

disable progressbar in distributed.

f207108

up

97e3805

up

8df3fbc

up

94d671d

sayakpaul requested a review from DN6 December 8, 2025 13:40

sayakpaul commented Dec 8, 2025

View reviewed changes

src/diffusers/utils/logging.py Show resolved Hide resolved

sayakpaul added 2 commits December 9, 2025 08:31

up

2529fdf

up

8193f38

sayakpaul commented Dec 9, 2025

View reviewed changes

Merge branch 'main' into progress-bar-dist

91c73bb

sayakpaul requested a review from dg845 December 9, 2025 03:25

dg845 reviewed Dec 10, 2025

View reviewed changes

dg845 approved these changes Dec 10, 2025

View reviewed changes

up

086a770

sayakpaul requested a review from dg845 December 10, 2025 03:16

Merge branch 'main' into progress-bar-dist

9cf5edd

dg845 approved these changes Dec 10, 2025

View reviewed changes

Merge branch 'main' into progress-bar-dist

4c09694

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] Handle progress bar and logging in distributed environments #12806

[core] Handle progress bar and logging in distributed environments #12806

sayakpaul commented Dec 8, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 8, 2025

Uh oh!

sayakpaul Dec 9, 2025

Uh oh!

dg845 Dec 10, 2025

Uh oh!

sayakpaul Dec 10, 2025

Uh oh!

dg845 left a comment

Uh oh!

dg845 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[core] Handle progress bar and logging in distributed environments #12806

Are you sure you want to change the base?

[core] Handle progress bar and logging in distributed environments #12806

Conversation

sayakpaul commented Dec 8, 2025

What does this PR do?

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 8, 2025

Uh oh!

sayakpaul Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

dg845 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants