Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -500,6 +500,8 @@
title: AuraFlow
- local: api/pipelines/blip_diffusion
title: BLIP-Diffusion
- local: api/pipelines/block_refinement
title: Block Refinement
- local: api/pipelines/bria_3_2
title: Bria 3.2
- local: api/pipelines/bria_fibo
Expand Down Expand Up @@ -578,6 +580,8 @@
title: Latent Diffusion
- local: api/pipelines/ledits_pp
title: LEDITS++
- local: api/pipelines/llada2
title: LLaDA2
- local: api/pipelines/longcat_image
title: LongCat-Image
- local: api/pipelines/lumina2
Expand Down Expand Up @@ -714,6 +718,8 @@
- sections:
- local: api/schedulers/overview
title: Overview
- local: api/schedulers/block_refinement
title: BlockRefinementScheduler
- local: api/schedulers/cm_stochastic_iterative
title: CMStochasticIterativeScheduler
- local: api/schedulers/ddim_cogvideox
Expand Down
60 changes: 60 additions & 0 deletions docs/source/en/api/pipelines/block_refinement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Block Refinement

`BlockRefinementPipeline` performs block-wise iterative refinement over a masked token template, sampling and
committing tokens based on confidence.

## Config defaults

You can set default sampling parameters when creating the pipeline. Passing `None` for a parameter in `__call__`
falls back to `pipe.config`.

```py
from diffusers import BlockRefinementPipeline, BlockRefinementScheduler

scheduler = BlockRefinementScheduler()
pipe = BlockRefinementPipeline(
model=model,
scheduler=scheduler,
tokenizer=tokenizer,
)

out = pipe(prompt="Explain gradient descent.", gen_length=256, block_length=32, steps=16, temperature=0.8)
print(out.texts[0])
```

## Callbacks

Callbacks run after each refinement step and can inspect or override the current tokens.

```py
def on_step_end(pipe, step, timestep, callback_kwargs):
cur_x = callback_kwargs["cur_x"]
# Inspect or modify `cur_x` here.
return {"cur_x": cur_x}

out = pipe(
prompt="Write a short poem.",
callback_on_step_end=on_step_end,
callback_on_step_end_tensor_inputs=["cur_x"],
)
```

## BlockRefinementPipeline
[[autodoc]] BlockRefinementPipeline
- all
- __call__

## BlockRefinementPipelineOutput
[[autodoc]] pipelines.BlockRefinementPipelineOutput
23 changes: 23 additions & 0 deletions docs/source/en/api/pipelines/llada2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# LLaDA2

`LLaDA2Pipeline` adapts block refinement sampling for LLaDA2-style token diffusion models.

## LLaDA2Pipeline
[[autodoc]] LLaDA2Pipeline
- all
- __call__

## LLaDA2PipelineOutput
[[autodoc]] pipelines.LLaDA2PipelineOutput
2 changes: 2 additions & 0 deletions docs/source/en/api/pipelines/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [AudioLDM2](audioldm2) | text2audio |
| [AuraFlow](aura_flow) | text2image |
| [BLIP Diffusion](blip_diffusion) | text2image |
| [Block Refinement](block_refinement) | text2text |
| [Bria 3.2](bria_3_2) | text2image |
| [CogVideoX](cogvideox) | text2video |
| [Consistency Models](consistency_models) | unconditional image generation |
Expand Down Expand Up @@ -62,6 +63,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
| [Latte](latte) | text2image |
| [LEDITS++](ledits_pp) | image editing |
| [LLaDA2](llada2) | text2text |
| [Lumina-T2X](lumina) | text2image |
| [Marigold](marigold) | depth-estimation, normals-estimation, intrinsic-decomposition |
| [MultiDiffusion](panorama) | text2image |
Expand Down
25 changes: 25 additions & 0 deletions docs/source/en/api/schedulers/block_refinement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# BlockRefinementScheduler

The `BlockRefinementScheduler` manages block-wise iterative refinement for discrete token diffusion. At each step it
commits the most confident tokens and optionally edits already-committed tokens when the model predicts a different
token with high confidence.

This scheduler is used by [`BlockRefinementPipeline`] and [`LLaDA2Pipeline`].

## BlockRefinementScheduler
[[autodoc]] BlockRefinementScheduler

## BlockRefinementSchedulerOutput
[[autodoc]] schedulers.scheduling_block_refinement.BlockRefinementSchedulerOutput
72 changes: 72 additions & 0 deletions examples/discrete_diffusion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Discrete Token Diffusion (Experimental)

This folder contains **training and sampling examples** for *discrete diffusion over token IDs* (language-model style), built to follow the `diffusers` + `accelerate` training conventions.

## Block refinement (commit-by-confidence)

Block refinement iteratively generates text in fixed-size blocks. At each step the model predicts all tokens in the block, commits the most confident ones, and re-masks the rest for further refinement.

### Train (Qwen causal LM)

```bash
accelerate launch examples/discrete_diffusion/train_block_refinement_qwen_cap.py \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--text_column text \
--output_dir qwen-block-refinement-output \
--max_train_steps 1000 \
--prompt_length 32 \
--block_length 32 \
--lambda_conf 2.0 \
--conf_temperature 0.5
```

If you don't want to download a dataset, you can use random-token data:

```bash
accelerate launch examples/discrete_diffusion/train_block_refinement_qwen_cap.py \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--output_dir qwen-block-refinement-output \
--use_dummy_data \
--num_dummy_samples 2048
```

### Sample

```bash
python examples/discrete_diffusion/sample_block_refinement.py \
--checkpoint_path qwen-block-refinement-output/final \
--device cuda \
--attention_mask_mode 2d \
--prompt "Write a short paragraph about diffusion models." \
--gen_length 128
```

For causal LMs that only support a 2D `attention_mask`, use `--attention_mask_mode 2d`.

## LLaDA2 sampling

[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) uses block refinement with a masked language model backbone. The `LLaDA2Pipeline` wraps `BlockRefinementPipeline` with LLaDA2-specific defaults.

```bash
python examples/discrete_diffusion/sample_llada2.py \
--model_id inclusionAI/LLaDA-8B-Instruct \
--prompt "Write a short poem about the ocean." \
--gen_length 128 \
--steps 128
```

### LLaDA2.1 editing support

LLaDA2.1 models support post-mask token editing via `--editing_threshold`:

```bash
python examples/discrete_diffusion/sample_llada2.py \
--model_id inclusionAI/LLaDA2.1-8B-Instruct \
--prompt "Explain quantum computing in simple terms." \
--gen_length 256 \
--steps 256 \
--editing_threshold 0.4 \
--max_post_steps 2
```
67 changes: 67 additions & 0 deletions examples/discrete_diffusion/sample_block_refinement.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env python

import argparse

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from diffusers import BlockRefinementPipeline, BlockRefinementScheduler


def main():
parser = argparse.ArgumentParser(description="Sample with BlockRefinementPipeline using a transformers causal LM.")
parser.add_argument("--checkpoint_path", type=str, required=True)
parser.add_argument("--cache_dir", type=str, default=None)
parser.add_argument("--prompt", type=str, default="Write a short paragraph about diffusion models.")
parser.add_argument("--gen_length", type=int, default=128)
parser.add_argument("--block_length", type=int, default=32)
parser.add_argument("--steps", type=int, default=32)
parser.add_argument("--temperature", type=float, default=0.0)
parser.add_argument("--top_p", type=float, default=1.0)
parser.add_argument("--top_k", type=int, default=0)
parser.add_argument("--threshold", type=float, default=0.95)
parser.add_argument("--seed", type=int, default=0)
parser.add_argument("--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu")
parser.add_argument("--attention_mask_mode", type=str, default="2d", choices=["auto", "4d", "2d", "none"])

args = parser.parse_args()

tokenizer = AutoTokenizer.from_pretrained(args.checkpoint_path, use_fast=True, cache_dir=args.cache_dir)
model = AutoModelForCausalLM.from_pretrained(
args.checkpoint_path,
torch_dtype=torch.bfloat16 if args.device.startswith("cuda") else torch.float32,
cache_dir=args.cache_dir,
)
model.to(args.device)
model.eval()

if tokenizer.mask_token_id is None:
raise ValueError("Tokenizer must have `mask_token_id` for block refinement sampling.")

scheduler = BlockRefinementScheduler()
pipe = BlockRefinementPipeline(model=model, scheduler=scheduler, tokenizer=tokenizer).to(args.device)
gen = torch.Generator(device=args.device).manual_seed(args.seed)

prompt_ids = tokenizer(args.prompt, return_tensors="pt")["input_ids"].to(args.device)
out = pipe(
prompt_ids=prompt_ids,
gen_length=int(args.gen_length),
block_length=int(args.block_length),
steps=int(args.steps),
temperature=float(args.temperature),
top_p=None if args.top_p >= 1.0 else float(args.top_p),
top_k=None if args.top_k <= 0 else int(args.top_k),
threshold=float(args.threshold),
eos_early_stop=True,
eos_token_id=int(tokenizer.eos_token_id) if tokenizer.eos_token_id is not None else None,
mask_token_id=int(tokenizer.mask_token_id),
attention_mask_mode=args.attention_mask_mode,
generator=gen,
return_text=True,
)

print(out.texts[0] if out.texts is not None else tokenizer.decode(out.sequences[0], skip_special_tokens=True))


if __name__ == "__main__":
main()
Loading
Loading