Diffulex

Diffulex is a Paged Attention-based dLLM accelerated decoding inference framework that is easy to develop and extensible. The design maximizes hiding the complexity of underlying KV Cache management, parallel strategy scheduling, and memory optimization. By providing a clean and unified API interface along with flexible inference strategy configurations (e.g., D2F, Block Diffusion, Fast-dLLM), Diffulex allows developers to focus on model inference logic and business requirements while maintaining production-level inference performance and resource utilization efficiency.

Latest News

12/22/2025 ✨: We are excited to announce that Diffulex, a Paged Attention-based dLLM accelerated decoding inference framework, is now open source and available to the public!

Tested Devices

Although Diffulex aims to be portable across a range of Devices, it has been specifically tested and validated on the following devices: for NVIDIA GPUs, this includes the H200, A100, RTX 4090, RTX 3090.

Installation

Method 1: Install with Pip

The only way to get started is to install from source:

uv pip install -e .

Quick Start

Here's a simple example to get started with Diffulex:

from diffulex import Diffulex, SamplingParams
from transformers import AutoTokenizer

# Initialize the Diffulex engine
model_path = "/path/to/your/model"
llm = Diffulex(
    model_path,
    model_name="fast_dllm_v2",  # or "dream", "llada", etc.
    tensor_parallel_size=1,
    data_parallel_size=1,
    gpu_memory_utilization=0.25,
    max_model_len=2048,
    decoding_strategy="block_diffusion",  # or "d2f", "fast_dllm"
    mask_token_id=151665,  # model-specific mask token ID
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Set sampling parameters
sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=256,
)

# Prepare prompts
prompts = [
    "Question: What is the capital of France? Answer:",
    "Question: Explain quantum computing in simple terms. Answer:",
]

# Generate responses
outputs = llm.generate(prompts, sampling_params)

# Process results
for output in outputs:
    print(f"Generated text: {output['text']}")
    print(f"Number of diffusion steps: {output['n_diff_steps']}")
    print(f"Token IDs: {output['token_ids']}")

For more examples, check out the examples directory.

Upcoming Features

Check our Diffulex v0.0.1 release plan for upcoming features.

Join the Discussion

Welcome to join our Discord community for discussions, support, and collaboration!

Acknowledgments

We would like to express our gratitude to Nano-vLLM, which serves as the primary codebase foundation for this project, and vLLM, from which we draw the core architectural concepts, particularly the Paged Attention mechanism. The initial version of this project was mainly developed by Yijie Jin with supervision from Prof. Zhijie Deng at Shanghai Jiao Tong University.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github		.github
assets/imgs		assets/imgs
diffulex		diffulex
diffulex_kernel		diffulex_kernel
diffulex_legacy		diffulex_legacy
docs		docs
examples		examples
scripts		scripts
tests		tests
third_party		third_party
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.pymarkdown		.pymarkdown
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
format.sh		format.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffulex

Latest News

Tested Devices

Installation

Method 1: Install with Pip

Quick Start

Upcoming Features

Join the Discussion

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

jpli02/Diffulex

Folders and files

Latest commit

History

Repository files navigation

Diffulex

Latest News

Tested Devices

Installation

Method 1: Install with Pip

Quick Start

Upcoming Features

Join the Discussion

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages