huggingface · kashif · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -500,6 +500,8 @@
         title: AuraFlow
       - local: api/pipelines/blip_diffusion
         title: BLIP-Diffusion
+      - local: api/pipelines/block_refinement
+        title: Block Refinement
       - local: api/pipelines/bria_3_2
         title: Bria 3.2
       - local: api/pipelines/bria_fibo
@@ -578,6 +580,8 @@
         title: Latent Diffusion
       - local: api/pipelines/ledits_pp
         title: LEDITS++
+      - local: api/pipelines/llada2
+        title: LLaDA2
       - local: api/pipelines/longcat_image
         title: LongCat-Image
       - local: api/pipelines/lumina2
@@ -714,6 +718,8 @@
   - sections:
     - local: api/schedulers/overview
       title: Overview
+    - local: api/schedulers/block_refinement
+      title: BlockRefinementScheduler
     - local: api/schedulers/cm_stochastic_iterative
       title: CMStochasticIterativeScheduler
     - local: api/schedulers/ddim_cogvideox

diff --git a/docs/source/en/api/pipelines/block_refinement.md b/docs/source/en/api/pipelines/block_refinement.md
@@ -0,0 +1,60 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Block Refinement
+
+`BlockRefinementPipeline` performs block-wise iterative refinement over a masked token template, sampling and
+committing tokens based on confidence.
+
+## Config defaults
+
+You can set default sampling parameters when creating the pipeline. Passing `None` for a parameter in `__call__`
+falls back to `pipe.config`.
+
+```py
+from diffusers import BlockRefinementPipeline, BlockRefinementScheduler
+
+scheduler = BlockRefinementScheduler()
+pipe = BlockRefinementPipeline(
+    model=model,
+    scheduler=scheduler,
+    tokenizer=tokenizer,
+)
+
+out = pipe(prompt="Explain gradient descent.", gen_length=256, block_length=32, steps=16, temperature=0.8)
+print(out.texts[0])
+```
+
+## Callbacks
+
+Callbacks run after each refinement step and can inspect or override the current tokens.
+
+```py
+def on_step_end(pipe, step, timestep, callback_kwargs):
+    cur_x = callback_kwargs["cur_x"]
+    # Inspect or modify `cur_x` here.
+    return {"cur_x": cur_x}
+
+out = pipe(
+    prompt="Write a short poem.",
+    callback_on_step_end=on_step_end,
+    callback_on_step_end_tensor_inputs=["cur_x"],
+)
+```
+
+## BlockRefinementPipeline
+[[autodoc]] BlockRefinementPipeline
+    - all
+    - __call__
+
+## BlockRefinementPipelineOutput
+[[autodoc]] pipelines.BlockRefinementPipelineOutput
diff --git a/docs/source/en/api/pipelines/llada2.md b/docs/source/en/api/pipelines/llada2.md
@@ -0,0 +1,23 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LLaDA2
+
+`LLaDA2Pipeline` adapts block refinement sampling for LLaDA2-style token diffusion models.
+
+## LLaDA2Pipeline
+[[autodoc]] LLaDA2Pipeline
+    - all
+    - __call__
+
+## LLaDA2PipelineOutput
+[[autodoc]] pipelines.LLaDA2PipelineOutput
diff --git a/docs/source/en/api/pipelines/overview.md b/docs/source/en/api/pipelines/overview.md
@@ -34,6 +34,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
 | [AudioLDM2](audioldm2) | text2audio |
 | [AuraFlow](aura_flow) | text2image |
 | [BLIP Diffusion](blip_diffusion) | text2image |
+| [Block Refinement](block_refinement) | text2text |
 | [Bria 3.2](bria_3_2) | text2image |
 | [CogVideoX](cogvideox) | text2video |
 | [Consistency Models](consistency_models) | unconditional image generation |
@@ -62,6 +63,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
 | [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
 | [Latte](latte) | text2image |
 | [LEDITS++](ledits_pp) | image editing |
+| [LLaDA2](llada2) | text2text |
 | [Lumina-T2X](lumina) | text2image |
 | [Marigold](marigold) | depth-estimation, normals-estimation, intrinsic-decomposition |
 | [MultiDiffusion](panorama) | text2image |

diff --git a/docs/source/en/api/schedulers/block_refinement.md b/docs/source/en/api/schedulers/block_refinement.md
@@ -0,0 +1,25 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# BlockRefinementScheduler
+
+The `BlockRefinementScheduler` manages block-wise iterative refinement for discrete token diffusion. At each step it
+commits the most confident tokens and optionally edits already-committed tokens when the model predicts a different
+token with high confidence.
+
+This scheduler is used by [`BlockRefinementPipeline`] and [`LLaDA2Pipeline`].
+
+## BlockRefinementScheduler
+[[autodoc]] BlockRefinementScheduler
+
+## BlockRefinementSchedulerOutput
+[[autodoc]] schedulers.scheduling_block_refinement.BlockRefinementSchedulerOutput
diff --git a/examples/discrete_diffusion/README.md b/examples/discrete_diffusion/README.md
@@ -0,0 +1,72 @@
+# Discrete Token Diffusion (Experimental)
+
+This folder contains **training and sampling examples** for *discrete diffusion over token IDs* (language-model style), built to follow the `diffusers` + `accelerate` training conventions.
+
+## Block refinement (commit-by-confidence)
+
+Block refinement iteratively generates text in fixed-size blocks. At each step the model predicts all tokens in the block, commits the most confident ones, and re-masks the rest for further refinement.
+
+### Train (Qwen causal LM)
+
+```bash
+accelerate launch examples/discrete_diffusion/train_block_refinement_qwen_cap.py \
+  --model_name_or_path Qwen/Qwen2.5-0.5B \
+  --dataset_name wikitext \
+  --dataset_config_name wikitext-2-raw-v1 \
+  --text_column text \
+  --output_dir qwen-block-refinement-output \
+  --max_train_steps 1000 \
+  --prompt_length 32 \
+  --block_length 32 \
+  --lambda_conf 2.0 \
+  --conf_temperature 0.5
+```
+
+If you don't want to download a dataset, you can use random-token data:
+
+```bash
+accelerate launch examples/discrete_diffusion/train_block_refinement_qwen_cap.py \
+  --model_name_or_path Qwen/Qwen2.5-0.5B \
+  --output_dir qwen-block-refinement-output \
+  --use_dummy_data \
+  --num_dummy_samples 2048
+```
+
+### Sample
+
+```bash
+python examples/discrete_diffusion/sample_block_refinement.py \
+  --checkpoint_path qwen-block-refinement-output/final \
+  --device cuda \
+  --attention_mask_mode 2d \
+  --prompt "Write a short paragraph about diffusion models." \
+  --gen_length 128
+```
+
+For causal LMs that only support a 2D `attention_mask`, use `--attention_mask_mode 2d`.
+
+## LLaDA2 sampling
+
+[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) uses block refinement with a masked language model backbone. The `LLaDA2Pipeline` wraps `BlockRefinementPipeline` with LLaDA2-specific defaults.
+
+```bash
+python examples/discrete_diffusion/sample_llada2.py \
+  --model_id inclusionAI/LLaDA-8B-Instruct \
+  --prompt "Write a short poem about the ocean." \
+  --gen_length 128 \
+  --steps 128
+```
+
+### LLaDA2.1 editing support
+
+LLaDA2.1 models support post-mask token editing via `--editing_threshold`:
+
+```bash
+python examples/discrete_diffusion/sample_llada2.py \
+  --model_id inclusionAI/LLaDA2.1-8B-Instruct \
+  --prompt "Explain quantum computing in simple terms." \
+  --gen_length 256 \
+  --steps 256 \
+  --editing_threshold 0.4 \
+  --max_post_steps 2
+```
diff --git a/examples/discrete_diffusion/sample_block_refinement.py b/examples/discrete_diffusion/sample_block_refinement.py
@@ -0,0 +1,67 @@
+#!/usr/bin/env python
+
+import argparse
+
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+from diffusers import BlockRefinementPipeline, BlockRefinementScheduler
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Sample with BlockRefinementPipeline using a transformers causal LM.")
+    parser.add_argument("--checkpoint_path", type=str, required=True)
+    parser.add_argument("--cache_dir", type=str, default=None)
+    parser.add_argument("--prompt", type=str, default="Write a short paragraph about diffusion models.")
+    parser.add_argument("--gen_length", type=int, default=128)
+    parser.add_argument("--block_length", type=int, default=32)
+    parser.add_argument("--steps", type=int, default=32)
+    parser.add_argument("--temperature", type=float, default=0.0)
+    parser.add_argument("--top_p", type=float, default=1.0)
+    parser.add_argument("--top_k", type=int, default=0)
+    parser.add_argument("--threshold", type=float, default=0.95)
+    parser.add_argument("--seed", type=int, default=0)
+    parser.add_argument("--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu")
+    parser.add_argument("--attention_mask_mode", type=str, default="2d", choices=["auto", "4d", "2d", "none"])
+
+    args = parser.parse_args()
+
+    tokenizer = AutoTokenizer.from_pretrained(args.checkpoint_path, use_fast=True, cache_dir=args.cache_dir)
+    model = AutoModelForCausalLM.from_pretrained(
+        args.checkpoint_path,
+        torch_dtype=torch.bfloat16 if args.device.startswith("cuda") else torch.float32,
+        cache_dir=args.cache_dir,
+    )
+    model.to(args.device)
+    model.eval()
+
+    if tokenizer.mask_token_id is None:
+        raise ValueError("Tokenizer must have `mask_token_id` for block refinement sampling.")
+
+    scheduler = BlockRefinementScheduler()
+    pipe = BlockRefinementPipeline(model=model, scheduler=scheduler, tokenizer=tokenizer).to(args.device)
+    gen = torch.Generator(device=args.device).manual_seed(args.seed)
+
+    prompt_ids = tokenizer(args.prompt, return_tensors="pt")["input_ids"].to(args.device)
+    out = pipe(
+        prompt_ids=prompt_ids,
+        gen_length=int(args.gen_length),
+        block_length=int(args.block_length),
+        steps=int(args.steps),
+        temperature=float(args.temperature),
+        top_p=None if args.top_p >= 1.0 else float(args.top_p),
+        top_k=None if args.top_k <= 0 else int(args.top_k),
+        threshold=float(args.threshold),
+        eos_early_stop=True,
+        eos_token_id=int(tokenizer.eos_token_id) if tokenizer.eos_token_id is not None else None,
+        mask_token_id=int(tokenizer.mask_token_id),
+        attention_mask_mode=args.attention_mask_mode,
+        generator=gen,
+        return_text=True,
+    )
+
+    print(out.texts[0] if out.texts is not None else tokenizer.decode(out.sequences[0], skip_special_tokens=True))
+
+
+if __name__ == "__main__":
+    main()