Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Fast-dLLM

📄 Paper: Fast-dLLM | 💻 Code: github.com/NVlabs/Fast-dLLM

Resources and examples for inferencing and evaluating LLaDA and Dream with Fast-dLLM.

Table of Contents

Files

# Pipeline modules for Fast-dLLM
dllm/pipelines/fastdllm
├── __init__.py
├── dream/
│   ├── __init__.py
│   ├── models/
│   │   ├── configuration_dream.py  # Fast-dLLM Dream model configuration
│   │   └── modeling_dream.py       # Fast-dLLM Dream model architecture
│   ├── sampler.py                  # Fast-dLLM Dream inference module
│   └── eval.py                     # Fast-dLLM Dream evaluation module
└── llada/
    ├── __init__.py
    ├── models/
    │   ├── configuration_llada.py  # Fast-dLLM LLaDA model configuration
    │   └── modeling_llada.py       # Fast-dLLM LLaDA model architecture
    ├── sampler.py                  # Fast-dLLM LLaDA inference module
    └── eval.py                     # Fast-dLLM LLaDA evaluation module

# Example entry points for inference and evaluation
examples/fastdllm
├── README.md                       # Documentation (you are here)
├── dream/
│   ├── sample.py                   # Fast-dLLM Dream inference example
│   └── eval.sh                     # Fast-dLLM Dream evaluation example
└── llada/
    ├── sample.py                   # Fast-dLLM LLaDA inference example
    └── eval.sh                     # Fast-dLLM LLaDA evaluation example

Inference

Implementation matches the original Fast-dLLM; identical hyperparameters yield identical outputs.

Sampling with the Fast-dLLM LLaDA sampler (e.g. prefix cache + confidence threshold):

# Use --use_cache (none/prefix/dual) for cache scheme, --threshold for confidence-based sampling
python examples/fastdllm/llada/sample.py --model_name_or_path "GSAI-ML/LLaDA-8B-Instruct" --use_cache prefix --threshold 0.9

Sampling with the Fast-dLLM Dream sampler (e.g. prefix cache + confidence-threshold decoding):

# Use --use_cache (none/prefix/dual) for cache scheme, --alg (entropy/confidence_threshold) and --threshold for decoding
python examples/fastdllm/dream/sample.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --use_cache prefix --alg confidence_threshold --threshold 0.9

Evaluation

Read (optional) Evaluation setup before running evaluation.

For example, to evaluate LLaDA-8B-Instruct or Dream-v0-Base-7B on GSM8K with 4 GPUs, run:

# Use model_args to pass Fast-dLLM sampling options (use_cache, threshold, etc.).
accelerate launch --num_processes 4 \
    dllm/pipelines/fastdllm/llada/eval.py \
    --tasks "gsm8k" \
    --num_fewshot 5 \
    --model "fastdllm_llada" \
    --apply_chat_template \
    --model_args "pretrained=GSAI-ML/LLaDA-8B-Instruct,use_cache=prefix,threshold=0.9,max_new_tokens=256,steps=256,block_size=32,suppress_tokens=[],begin_suppress_tokens=[]"

accelerate launch --num_processes 4 \
    dllm/pipelines/fastdllm/dream/eval.py \
    --tasks "gsm8k" \
    --num_fewshot 5 \
    --model "fastdllm_dream" \
    --apply_chat_template \
    --model_args "pretrained=Dream-org/Dream-v0-Base-7B,use_cache=prefix,max_new_tokens=256,steps=256,block_size=32,alg=confidence_threshold,threshold=0.9,dtype=bfloat16,add_bos_token=True"

To evaluate LLaDA-8B-Instruct and Dream-v0-Base-7B with the Fast-dLLM sampler across all cache settings and benchmarks, run:

bash examples/fastdllm/llada/eval.sh --model_name_or_path "GSAI-ML/LLaDA-8B-Instruct" --instruct True --num_gpu 1
bash examples/fastdllm/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct False --num_gpu 1

Evaluation results

Results (Reproduced) are evaluated using our framework, while results (Official) come from the original paper. All evaluation settings follow the Fast-dLLM repository with minor modifications; we add support for MBPP and Minerva-Math benchmarks (not provided in the original repo).

Benchmark Source Len = 256 Len = 512
Baseline +Cache +Parallel +Cache & Parallel Baseline +Cache +Parallel +Cache & Parallel
AccTok/s (×) AccTok/s (×) AccTok/s (×) AccTok/s (×) AccTok/s (×) AccTok/s (×) AccTok/s (×) AccTok/s (×)
GSM8K Official 79.36.7 (1.0×) 79.521.2 (3.2×) 79.216.5 (2.5×) 78.554.4 (8.1×) 77.53.2 (1.0×) 77.010.4 (3.3×) 77.618.6 (5.8×) 77.235.3 (11.0×)
Reproduced 78.08.1 (1.0×) 78.225.6 (3.2×) 78.918.4 (2.3×) 78.052.8 (6.5×) 81.16.7 (1.0×) 76.019.9 (3.0×) 77.621.8 (3.3×) 76.651.7 (7.8×)
MATH Official 33.59.1 (1.0×) 33.323.7 (2.6×) 33.424.8 (2.7×) 33.251.7 (5.7×) 37.28.0 (1.0×) 36.219.7 (2.5×) 36.823.8 (3.0×) 36.047.1 (5.9×)
Reproduced 38.39.7 (1.0×) 37.626.4 (2.7×) 38.619.6 (2.0×) 37.549.0 (5.0×) 42.47.4 (1.0×) 41.921.1 (2.9×) 42.519.8 (2.7×) 41.844.8 (6.1×)
HumanEval Official 41.530.5 (1.0×) 42.740.7 (1.3×) 43.9101.5 (3.3×) 43.3114.1 (3.7×) 43.918.4 (1.0×) 45.729.3 (1.6×) 43.357.1 (3.1×) 44.573.7 (4.0×)
Reproduced 38.418.8 (1.0×) 36.027.6 (1.5×) 39.653.0 (2.8×) 36.068.1 (3.6×) 48.213.0 (1.0×) 41.523.3 (1.8×) 50.636.3 (2.8×) 41.555.8 (4.3×)
MBPP Official 29.46.0 (1.0×) 29.617.0 (2.8×) 28.424.8 (4.1×) 28.244.8 (7.5×) 14.84.3 (1.0×) 13.410.1 (2.3×) 15.022.3 (5.1×) 13.839.5 (9.2×)
Reproduced 36.49.3 (1.0×) 38.026.2 (2.8×) 29.017.6 (1.9×) 37.844.7 (4.8×) 32.27.7 (1.0×) 22.020.8 (2.7×) 7.621.0 (2.7×) 21.443.9 (5.7×)

Table 1: Evaluation results of LLaDA-8B-Instruct with Fast-dLLM.

Benchmark Source Len = 256 Len = 512
Baseline +Cache +Parallel +Cache & Parallel Baseline +Cache +Parallel +Cache & Parallel
Acc Tok/s (×) Acc Tok/s (×) Acc Tok/s (×) Acc Tok/s (×) Acc Tok/s (×) Acc Tok/s (×) Acc Tok/s (×) Acc Tok/s (×)
GSM8K Official 75.09.1 (1.0×) 74.332.5 (3.6×) 74.214.2 (1.6×) 74.848.2 (5.3×) 76.07.7 (1.0×) 74.325.6 (3.3×) 73.414.6 (1.9×) 74.042.9 (5.6×)
Reproduced 75.49.0 (1.0×) 75.032.9 (3.7×) 72.612.1 (1.4×) 74.242.2 (4.7×) 75.77.6 (1.0×) 73.825.6 (3.4×) 72.711.8 (1.6×) 74.533.7 (4.4×)
MATH Official 38.411.4 (1.0×) 36.834.3 (3.0×) 37.927.3 (2.4×) 37.666.8 (5.9×) 39.89.6 (1.0×) 38.026.8 (2.8×) 39.531.6 (3.2×) 39.363.3 (6.5×)
Reproduced 31.525.1 (1.0×) 33.336.8 (1.5×) 23.552.1 (2.1×) 31.176.7 (3.1×) 39.215.8 (1.0×) 39.228.1 (1.8×) 32.025.5 (1.6×) 38.946.3 (2.9×)
HumanEval Official 49.423.3 (1.0×) 53.735.2 (1.5×) 49.445.6 (2.0×) 54.362.0 (2.8×) 54.316.3 (1.0×) 54.927.8 (1.7×) 51.829.8 (1.8×) 54.352.8 (3.2×)
Reproduced 57.914.0 (1.0×) 53.733.8 (2.4×) 51.221.0 (1.5×) 53.143.0 (3.1×) 54.910.4 (1.0×) 54.926.3 (2.5×) 50.616.4 (1.6×) 54.337.4 (3.6×)
MBPP Official 56.611.2 (1.0×) 53.234.5 (3.1×) 53.831.8 (2.8×) 56.476.0 (6.8×) 55.69.4 (1.0×) 53.826.7 (2.8×) 55.437.6 (4.0×) 55.273.6 (7.8×)
Reproduced 55.69.9 (1.0×) 53.832.1 (3.3×) 53.624.1 (2.5×) 56.061.8 (6.3×) 56.04.6 (1.0×) 52.625.0 (5.4×) 52.829.5 (6.4×) 54.461.1 (13.3×)

Table 2: Evaluation results of Dream-v0-Base-7B with Fast-dLLM.