Name	Name	Last commit message	Last commit date
parent directory ..
dream	dream
llada	llada
README.md	README.md

Fast-dLLM

📄 Paper: Fast-dLLM | 💻 Code: github.com/NVlabs/Fast-dLLM

Resources and examples for inferencing and evaluating LLaDA and Dream with Fast-dLLM.

Files
Inference
Evaluation

Files

# Pipeline modules for Fast-dLLM
dllm/pipelines/fastdllm
├── __init__.py
├── dream/
│   ├── __init__.py
│   ├── models/
│   │   ├── configuration_dream.py  # Fast-dLLM Dream model configuration
│   │   └── modeling_dream.py       # Fast-dLLM Dream model architecture
│   ├── sampler.py                  # Fast-dLLM Dream inference module
│   └── eval.py                     # Fast-dLLM Dream evaluation module
└── llada/
    ├── __init__.py
    ├── models/
    │   ├── configuration_llada.py  # Fast-dLLM LLaDA model configuration
    │   └── modeling_llada.py       # Fast-dLLM LLaDA model architecture
    ├── sampler.py                  # Fast-dLLM LLaDA inference module
    └── eval.py                     # Fast-dLLM LLaDA evaluation module

# Example entry points for inference and evaluation
examples/fastdllm
├── README.md                       # Documentation (you are here)
├── dream/
│   ├── sample.py                   # Fast-dLLM Dream inference example
│   └── eval.sh                     # Fast-dLLM Dream evaluation example
└── llada/
    ├── sample.py                   # Fast-dLLM LLaDA inference example
    └── eval.sh                     # Fast-dLLM LLaDA evaluation example

Inference

Implementation matches the original Fast-dLLM; identical hyperparameters yield identical outputs.

Sampling with the Fast-dLLM LLaDA sampler (e.g. prefix cache + confidence threshold):

# Use --use_cache (none/prefix/dual) for cache scheme, --threshold for confidence-based sampling
python examples/fastdllm/llada/sample.py --model_name_or_path "GSAI-ML/LLaDA-8B-Instruct" --use_cache prefix --threshold 0.9

Sampling with the Fast-dLLM Dream sampler (e.g. prefix cache + confidence-threshold decoding):

# Use --use_cache (none/prefix/dual) for cache scheme, --alg (entropy/confidence_threshold) and --threshold for decoding
python examples/fastdllm/dream/sample.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --use_cache prefix --alg confidence_threshold --threshold 0.9

Evaluation

Read (optional) Evaluation setup before running evaluation.

For example, to evaluate LLaDA-8B-Instruct or Dream-v0-Base-7B on GSM8K with 4 GPUs, run:

# Use model_args to pass Fast-dLLM sampling options (use_cache, threshold, etc.).
accelerate launch --num_processes 4 \
    dllm/pipelines/fastdllm/llada/eval.py \
    --tasks "gsm8k" \
    --num_fewshot 5 \
    --model "fastdllm_llada" \
    --apply_chat_template \
    --model_args "pretrained=GSAI-ML/LLaDA-8B-Instruct,use_cache=prefix,threshold=0.9,max_new_tokens=256,steps=256,block_size=32,suppress_tokens=[],begin_suppress_tokens=[]"

accelerate launch --num_processes 4 \
    dllm/pipelines/fastdllm/dream/eval.py \
    --tasks "gsm8k" \
    --num_fewshot 5 \
    --model "fastdllm_dream" \
    --apply_chat_template \
    --model_args "pretrained=Dream-org/Dream-v0-Base-7B,use_cache=prefix,max_new_tokens=256,steps=256,block_size=32,alg=confidence_threshold,threshold=0.9,dtype=bfloat16,add_bos_token=True"

To evaluate LLaDA-8B-Instruct and Dream-v0-Base-7B with the Fast-dLLM sampler across all cache settings and benchmarks, run:

bash examples/fastdllm/llada/eval.sh --model_name_or_path "GSAI-ML/LLaDA-8B-Instruct" --instruct True --num_gpu 1
bash examples/fastdllm/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct False --num_gpu 1

Evaluation results

Results (Reproduced) are evaluated using our framework, while results (Official) come from the original paper. All evaluation settings follow the Fast-dLLM repository with minor modifications; we add support for MBPP and Minerva-Math benchmarks (not provided in the original repo).

Benchmark	Source	Len = 256								Len = 512
		Baseline		+Cache		+Parallel		+Cache & Parallel		Baseline		+Cache		+Parallel		+Cache & Parallel
		Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)
GSM8K	Official	79.3	6.7 (1.0×)	79.5	21.2 (3.2×)	79.2	16.5 (2.5×)	78.5	54.4 (8.1×)	77.5	3.2 (1.0×)	77.0	10.4 (3.3×)	77.6	18.6 (5.8×)	77.2	35.3 (11.0×)
GSM8K	Reproduced	78.0	8.1 (1.0×)	78.2	25.6 (3.2×)	78.9	18.4 (2.3×)	78.0	52.8 (6.5×)	81.1	6.7 (1.0×)	76.0	19.9 (3.0×)	77.6	21.8 (3.3×)	76.6	51.7 (7.8×)
MATH	Official	33.5	9.1 (1.0×)	33.3	23.7 (2.6×)	33.4	24.8 (2.7×)	33.2	51.7 (5.7×)	37.2	8.0 (1.0×)	36.2	19.7 (2.5×)	36.8	23.8 (3.0×)	36.0	47.1 (5.9×)
MATH	Reproduced	38.3	9.7 (1.0×)	37.6	26.4 (2.7×)	38.6	19.6 (2.0×)	37.5	49.0 (5.0×)	42.4	7.4 (1.0×)	41.9	21.1 (2.9×)	42.5	19.8 (2.7×)	41.8	44.8 (6.1×)
HumanEval	Official	41.5	30.5 (1.0×)	42.7	40.7 (1.3×)	43.9	101.5 (3.3×)	43.3	114.1 (3.7×)	43.9	18.4 (1.0×)	45.7	29.3 (1.6×)	43.3	57.1 (3.1×)	44.5	73.7 (4.0×)
HumanEval	Reproduced	38.4	18.8 (1.0×)	36.0	27.6 (1.5×)	39.6	53.0 (2.8×)	36.0	68.1 (3.6×)	48.2	13.0 (1.0×)	41.5	23.3 (1.8×)	50.6	36.3 (2.8×)	41.5	55.8 (4.3×)
MBPP	Official	29.4	6.0 (1.0×)	29.6	17.0 (2.8×)	28.4	24.8 (4.1×)	28.2	44.8 (7.5×)	14.8	4.3 (1.0×)	13.4	10.1 (2.3×)	15.0	22.3 (5.1×)	13.8	39.5 (9.2×)
MBPP	Reproduced	36.4	9.3 (1.0×)	38.0	26.2 (2.8×)	29.0	17.6 (1.9×)	37.8	44.7 (4.8×)	32.2	7.7 (1.0×)	22.0	20.8 (2.7×)	7.6	21.0 (2.7×)	21.4	43.9 (5.7×)

Table 1: Evaluation results of LLaDA-8B-Instruct with Fast-dLLM.

Benchmark	Source	Len = 256								Len = 512
		Baseline		+Cache		+Parallel		+Cache & Parallel		Baseline		+Cache		+Parallel		+Cache & Parallel
		Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)	Acc	Tok/s (×)
GSM8K	Official	75.0	9.1 (1.0×)	74.3	32.5 (3.6×)	74.2	14.2 (1.6×)	74.8	48.2 (5.3×)	76.0	7.7 (1.0×)	74.3	25.6 (3.3×)	73.4	14.6 (1.9×)	74.0	42.9 (5.6×)
GSM8K	Reproduced	75.4	9.0 (1.0×)	75.0	32.9 (3.7×)	72.6	12.1 (1.4×)	74.2	42.2 (4.7×)	75.7	7.6 (1.0×)	73.8	25.6 (3.4×)	72.7	11.8 (1.6×)	74.5	33.7 (4.4×)
MATH	Official	38.4	11.4 (1.0×)	36.8	34.3 (3.0×)	37.9	27.3 (2.4×)	37.6	66.8 (5.9×)	39.8	9.6 (1.0×)	38.0	26.8 (2.8×)	39.5	31.6 (3.2×)	39.3	63.3 (6.5×)
MATH	Reproduced	31.5	25.1 (1.0×)	33.3	36.8 (1.5×)	23.5	52.1 (2.1×)	31.1	76.7 (3.1×)	39.2	15.8 (1.0×)	39.2	28.1 (1.8×)	32.0	25.5 (1.6×)	38.9	46.3 (2.9×)
HumanEval	Official	49.4	23.3 (1.0×)	53.7	35.2 (1.5×)	49.4	45.6 (2.0×)	54.3	62.0 (2.8×)	54.3	16.3 (1.0×)	54.9	27.8 (1.7×)	51.8	29.8 (1.8×)	54.3	52.8 (3.2×)
HumanEval	Reproduced	57.9	14.0 (1.0×)	53.7	33.8 (2.4×)	51.2	21.0 (1.5×)	53.1	43.0 (3.1×)	54.9	10.4 (1.0×)	54.9	26.3 (2.5×)	50.6	16.4 (1.6×)	54.3	37.4 (3.6×)
MBPP	Official	56.6	11.2 (1.0×)	53.2	34.5 (3.1×)	53.8	31.8 (2.8×)	56.4	76.0 (6.8×)	55.6	9.4 (1.0×)	53.8	26.7 (2.8×)	55.4	37.6 (4.0×)	55.2	73.6 (7.8×)
MBPP	Reproduced	55.6	9.9 (1.0×)	53.8	32.1 (3.3×)	53.6	24.1 (2.5×)	56.0	61.8 (6.3×)	56.0	4.6 (1.0×)	52.6	25.0 (5.4×)	52.8	29.5 (6.4×)	54.4	61.1 (13.3×)

Table 2: Evaluation results of Dream-v0-Base-7B with Fast-dLLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Fast-dLLM

Table of Contents

Files

Inference

Evaluation

Evaluation results

FilesExpand file tree

fastdllm

Directory actions

More options

Directory actions

More options

Latest commit

History

fastdllm

Folders and files

parent directory

README.md

Fast-dLLM

Table of Contents

Files

Inference

Evaluation

Evaluation results