Comparisons between different FnBn compute block configurations show that more compute blocks result in higher precision . For example, the F8B0_W8MC0 configuration achieves the best Clip Score (33.007) and ImageReward (1.0333). The meaning of parameter configuration is as follows (such as F8B0_W8M0MC0_T0O1_R0.08): (Device : NVIDIA L20.)
F : Fn_compute_blocks, int, Specifies that DBCache uses thefirst n Transformer blocks to fit the information at time step t, enabling the calculation of a more stable L1 difference and delivering more accurate information to subsequent blocks.
B : Bn_compute_blocks, int, Further fuses approximate information in thelast n Transformer blocks to enhance prediction accuracy. These blocks act as an auto-scaler for approximate hidden states that use residual cache.
W : max_warmup_steps, int, DBCache does not apply the caching strategy when the number of running steps is less than or equal to this value, ensuring the model sufficiently learns basic features during warmup.
I : warmup_interval, int, defaults to 1, Skip interval in warmup steps, e.g., when warmup_interval is 2, only 0, 2, 4, ... steps in warmup steps will be computed, others will use dynamic cache.
M : max_cached_steps, int, DBCache disables the caching strategy when the previous cached steps exceed this value to prevent precision degradation.
MC : max_continuous_cached_steps, int, DBCache disables the caching strategy when the previous continuous cached steps exceed this value to prevent precision degradation. (namely, hybrid dynamic cache and static cache)
T : enable talyorseer or not (namely, hybrid taylorseer w/ dynamic cache - DBCache). DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
O : The taylorseer order, int, e.g., O1 means order 1.
R : The residual diff threshold of DBCache, range [0, 1.0)
Latency(s) : Recorded compute time (eager mode) that w/o other optimizations
TFLOPs : Recorded compute FLOPs using calflops 's calculate_flops API.
Note
Among all the accuracy indicators, the overall accuracy has slightly improved after using TaylorSeer.
📚Text2Image DrawBench: FLUX.1-dev
Config
Clip Score(↑)
ImageReward(↑)
PSNR(↑)
TFLOPs(↓)
SpeedUp(↑)
[FLUX.1 -dev]: 50 steps
32.9217
1.0412
INF
3726.87
1.00x
F8B0_W4MC0_T0O1_R0.08
32.9871
1.0370
33.8317
2064.81
1.80x
F8B0_W4MC2_T0O1_R0.12
32.9535
1.0185
32.7346
1935.73
1.93x
F8B0_W4MC3_T0O1_R0.12
32.9234
1.0085
32.5385
1816.58
2.05x
F4B0_W4MC3_T0O1_R0.12
32.8981
1.0130
31.8031
1507.83
2.47x
F4B0_W4MC4_T0O1_R0.12
32.8384
1.0065
31.5292
1400.08
2.66x
The comparison between cache-dit: DBCache and algorithms such as Δ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa is as follows. Now, in the comparison with a speedup ratio less than 3x , cache-dit achieved the best accuracy. Please check 📚How to Reproduce? for more details.
Method
TFLOPs(↓)
SpeedUp(↑)
ImageReward(↑)
Clip Score(↑)
[FLUX.1 -dev]: 50 steps
3726.87
1.00×
0.9898
32.404
[FLUX.1 -dev]: 60% steps
2231.70
1.67×
0.9663
32.312
Δ-DiT(N=2)
2480.01
1.50×
0.9444
32.273
Δ-DiT(N=3)
1686.76
2.21×
0.8721
32.102
[FLUX.1 -dev]: 34% steps
1264.63
3.13×
0.9453
32.114
Chipmunk
1505.87
2.47×
0.9936
32.776
FORA(N=3)
1320.07
2.82×
0.9776
32.266
DBCache(S)
1400.08
2.66×
1.0065
32.838
DuCa(N=5)
978.76
3.80×
0.9955
32.241
TaylorSeer(N=4,O=2)
1042.27
3.57×
0.9857
32.413
DBCache(S)+TS
1153.05
3.23×
1.0221
32.819
DBCache(M)
944.75
3.94×
0.9997
32.849
DBCache(M)+TS
944.75
3.94×
1.0107
32.865
FoCa(N=5): arxiv.2508.16211
893.54
4.16×
1.0029
32.948
[FLUX.1 -dev]: 22% steps
818.29
4.55×
0.8183
31.772
FORA(N=7)
670.14
5.55×
0.7418
31.519
ToCa(N=12)
644.70
5.77×
0.7155
31.808
DuCa(N=10)
606.91
6.13×
0.8382
31.759
TeaCache(l=1.2)
669.27
5.56×
0.7394
31.704
TaylorSeer(N=7,O=2)
670.44
5.54×
0.9128
32.128
DBCache(F)
651.90
5.72x
0.9271
32.552
FoCa(N=8): arxiv.2508.16211
596.07
6.24×
0.9502
32.706
DBCache(F)+TS
651.90
5.72x
0.9526
32.568
DBCache(U)+TS
505.47
7.37x
0.8645
32.719
NOTE: Except for DBCache, other performance data are referenced from the paper FoCa, arxiv.2508.16211 . The configurations of DBCache are listed as belows:
Algo
Configuration
Algo
Configuration
DBCache(S:Slow)
F=4,B=0,W=4,I=1,MC=4
DBCache(S:Slow)+TS:TaylorSeer
F=1,B=0,W=4,I=1,MC=4,O=1
DBCache(M:Medium)
F=1,B=0,W=4,I=1,MC=6
DBCache(M:Medium)+TS:TaylorSeer
F=1,B=0,W=4,I=1,MC=6,O=1
DBCache(F:Fast)
F=1,B=0,W=8,I=2,MC=8
DBCache(F:Fast)+TS:TaylorSeer
F=1,B=0,W=8,I=2,MC=8,O=1
DBCache(U:Ultra)
F=1,B=0,W=8,I=4,MC=10,
DBCache(U:Ultra)+TS:TaylorSeer
F=1,B=0,W=8,I=4,MC=10,O=1
📚Text2Image Distillation DrawBench: Qwen-Image-Lightning
Surprisingly, cache-dit: DBCache still works in the extremely few-step distill model. For example, Qwen-Image-Lightning w/ 4 steps , with the F16B16 configuration, the PSNR is 34.8163, the Clip Score is 35.6109, and the ImageReward is 1.2614. It maintained a relatively high precision.
Config
PSNR(↑)
Clip Score(↑)
ImageReward(↑)
TFLOPs(↓)
SpeedUp(↑)
[Lightning ]: 4 steps
INF
35.5797
1.2630
274.33
1.00x
F24B24_W2MC1_T0O1_R0.8
36.3242
35.6224
1.2630
264.74
1.04x
F16B16_W2MC1_T0O1_R0.8
34.8163
35.6109
1.2614
244.25
1.12x
F12B12_W2MC1_T0O1_R0.8
33.8953
35.6535
1.2549
234.63
1.17x
F8B8_W2MC1_T0O1_R0.8
33.1374
35.7284
1.2517
224.29
1.22x
F1B0_W2MC1_T0O1_R0.8
31.8317
35.6651
1.2397
206.90
1.33x
# install requirements
pip3 install git+https://github.com/openai/CLIP.git
pip3 install git+https://github.com/chengzegang/calculate-flops.pytorch.git
pip3 install image-reward
pip3 install git+https://github.com/vipshop/cache-dit.git
git clone https://github.com/vipshop/cache-dit.git
cd cache-dit/bench && mkdir tmp && mkdir log && mkdir hf_models && cd hf_models
# FLUX.1-dev
modelscope download black-forest-labs/FLUX.1-dev --local_dir ./FLUX.1-dev
hf download black-forest-labs/FLUX.1-dev --local-dir ./FLUX.1-dev
export FLUX_DIR=$PWD /FLUX.1-dev
# Qwen-Image-Lightning
modelscope download Qwen/Qwen-Image --local_dir ./Qwen-Image
modelscope download lightx2v/Qwen-Image-Lightning --local_dir ./Qwen-Image-Lightning
hf download Qwen/Qwen-Image --local-dir ./Qwen-Image
hf download lightx2v/Qwen-Image-Lightning --local-dir ./Qwen-Image-Lightning
export QWEN_IMAGE_DIR=$PWD /Qwen-Image
export QWEN_IMAGE_LIGHT_DIR=$PWD /Qwen-Image-Lightning
# Clip Score & Image Reward
modelscope download laion/CLIP-ViT-g-14-laion2B-s12B-b42K --local_dir ./CLIP-ViT-g-14-laion2B-s12B-b42K
modelscope download ZhipuAI/ImageReward --local_dir ./ImageReward
hf download laion/CLIP-ViT-g-14-laion2B-s12B-b42K --local-dir ./CLIP-ViT-g-14-laion2B-s12B-b42K
hf download ZhipuAI/ImageReward --local-dir ./ImageReward
export CLIP_MODEL_DIR=$PWD /CLIP-ViT-g-14-laion2B-s12B-b42K
export IMAGEREWARD_MODEL_DIR=$PWD /ImageReward
cd ..
# NOTE: The reported benchmark was run on NVIDIA L20 device.
# FLUX.1-dev DrawBench w/ low speedup ratio
export CUDA_VISIBLE_DEVICES=0
nohup bash bench.sh default > log/cache_dit_bench_default.log 2>&1 &
export CUDA_VISIBLE_DEVICES=1
nohup bash bench.sh taylorseer > log/cache_dit_bench_taylorseer.log 2>&1 &
bash ./metrics.sh
# FLUX.1-dev DrawBench w/ high speedup ratio
export CUDA_VISIBLE_DEVICES=0
nohup bash bench_fast.sh default > log/cache_dit_bench_fast.log 2>&1 &
export CUDA_VISIBLE_DEVICES=1
nohup bash bench_fast.sh taylorseer > log/cache_dit_bench_taylorseer_fast.log 2>&1 &
bash ./metrics_fast.sh
# Qwen-Image-Lightning DrawBench
export CUDA_VISIBLE_DEVICES=0,1
nohup bash bench_distill.sh 8_steps > log/cache_dit_bench_distill_8_steps.log 2>&1 &
export CUDA_VISIBLE_DEVICES=2,3
nohup bash bench_distill.sh 4_steps > log/cache_dit_bench_distill_4_steps.log 2>&1 &
bash ./metrics_distill.sh
📖ClipScore: DBCache(Slow) FnBn
Config
CLIP_SCORE
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
32.9217
42.74
1.00
3726.87
1.00
F1B0_W4M0MC0_R0.08_T0O0
33.0745
23.21
1.84
1729.60
2.15
F8B0_W4M0MC3_R0.32_T0O0
33.0055
22.07
1.94
1548.80
2.41
F8B0_W4M0MC4_R0.24_T0O0
32.9921
20.79
2.06
1421.96
2.62
F8B0_W4M0MC6_R0.32_T0O0
32.9804
19.38
2.21
1262.77
2.95
F4B0_W4M0MC4_R0.24_T0O0
32.9720
18.77
2.28
1231.01
3.03
F4B0_W4M0MC4_R0.32_T0O0
32.9680
18.10
2.36
1163.10
3.20
F4B0_W4M0MC6_R0.32_T0O0
32.8926
16.62
2.57
1024.52
3.64
F1B0_W4M0MC4_R0.32_T0O0
32.8640
16.30
2.62
1017.97
3.66
F1B0_W4M0MC6_R0.24_T0O0
32.8496
15.54
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T0O0
32.8098
15.65
2.73
942.92
3.95
📖ClipScore: DBCache(Slow) FnBn + TaylorSeer O(1)
Config
CLIP_SCORE
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
32.9217
42.92
1.00
3726.87
1.00
F1B0_W4M0MC0_R0.08_T1O1
33.0398
23.29
1.84
1730.70
2.15
F4B0_W4M0MC3_R0.12_T1O1
32.9795
21.36
2.01
1499.51
2.49
F1B0_W4M0MC6_R0.32_T1O1
32.9589
15.70
2.73
944.02
3.95
📖ClipScore: DBCache(Fast) FnBn
Config
CLIP_SCORE
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
C0_Q0_NONE
32.9616
44.61
1.00
3726.87
1.00
F1B0_W16I4M0MC10_R0.8_T0O0
32.6306
10.48
4.26
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T0O0
32.5522
11.26
3.96
651.90
5.72
F1B0_W8I4M0MC8_R0.8_T0O0
32.5420
10.46
4.26
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T0O0
32.4479
9.71
4.59
505.47
7.37
F1B0_W8I2M0MC8_R0.8_T0O0
32.4395
11.40
3.91
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T0O0
32.4207
10.62
4.20
578.69
6.44
📖ClipScore: DBCache(Fast) FnBn + TaylorSeer O(1)
Config
CLIP_SCORE
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
C0_Q0_NONE
32.9616
44.94
1.00
3726.87
1.00
F1B0_W16I4M0MC8_R0.8_T1O1
33.0988
11.42
3.94
651.90
5.72
F1B0_W16I4M0MC10_R0.8_T1O1
32.9764
10.65
4.22
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T1O1
32.7195
9.83
4.57
505.47
7.37
F1B0_W8I4M0MC8_R0.8_T1O1
32.6448
10.56
4.26
578.69
6.44
F1B0_W8I2M0MC8_R0.8_T1O1
32.5682
11.44
3.93
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T1O1
32.3999
10.66
4.22
578.69
6.44
📖ImageReward: DBCache(Slow) FnBn
Config
IMAGE_REWARD
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
1.0412
42.74
1.00
3726.87
1.00
F1B0_W4M0MC2_R0.16_T0O0
1.0461
21.29
2.01
1530.46
2.44
F1B0_W4M0MC2_R0.24_T0O0
1.0418
20.51
2.08
1457.25
2.56
F1B0_W4M0MC2_R0.2_T0O0
1.0418
20.53
2.08
1457.25
2.56
F1B0_W4M0MC2_R0.32_T0O0
1.0418
20.63
2.07
1457.25
2.56
F1B0_W4M0MC3_R0.16_T0O0
1.0360
19.09
2.24
1310.82
2.84
F1B0_W4M0MC3_R0.2_T0O0
1.0323
18.57
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T0O0
1.0232
17.75
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T0O0
1.0229
17.71
2.41
1153.05
3.23
F1B0_W4M0MC4_R0.24_T0O0
1.0096
16.98
2.52
1091.18
3.42
F1B0_W4M0MC4_R0.32_T0O0
1.0083
16.30
2.62
1017.97
3.66
F1B0_W4M0MC6_R0.24_T0O0
0.9997
15.54
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T0O0
0.9921
15.65
2.73
942.92
3.95
📖ImageReward: DBCache(Slow) FnBn + TaylorSeer O(1)
Config
IMAGE_REWARD
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
1.0412
42.92
1.00
3726.87
1.00
F1B0_W4M0MC0_R0.08_T1O1
1.0591
23.29
1.84
1730.70
2.15
F1B0_W4M0MC2_R0.16_T1O1
1.0389
21.35
2.01
1530.46
2.44
F1B0_W4M0MC2_R0.24_T1O1
1.0379
20.61
2.08
1457.25
2.56
F1B0_W4M0MC2_R0.2_T1O1
1.0379
20.59
2.08
1457.25
2.56
F1B0_W4M0MC2_R0.32_T1O1
1.0379
20.70
2.07
1457.25
2.56
F1B0_W4M0MC3_R0.12_T1O1
1.0292
20.70
2.07
1457.62
2.56
F1B0_W4M0MC3_R0.2_T1O1
1.0257
18.63
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.2_T1O1
1.0221
17.71
2.42
1153.05
3.23
F1B0_W4M0MC4_R0.32_T1O1
1.0149
16.33
2.63
1017.97
3.66
F1B0_W4M0MC6_R0.24_T1O1
1.0107
15.61
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T1O1
1.0025
15.70
2.73
944.02
3.95
📖ImageReward: DBCache(Fast) FnBn
Config
IMAGE_REWARD
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
C0_Q0_NONE
1.0445
44.61
1.00
3726.87
1.00
F1B0_W8I2M0MC8_R0.8_T0O0
0.9271
11.40
3.91
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T0O0
0.9234
10.62
4.20
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T0O0
0.9115
11.26
3.96
651.90
5.72
F1B0_W16I4M0MC10_R0.8_T0O0
0.8644
10.48
4.26
578.69
6.44
F1B0_W8I4M0MC8_R0.8_T0O0
0.8301
10.46
4.26
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T0O0
0.8092
9.71
4.59
505.47
7.37
📖ImageReward: DBCache(Fast) FnBn + TaylorSeer O(1)
Config
IMAGE_REWARD
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
C0_Q0_NONE
1.0445
44.94
1.00
3726.87
1.00
F1B0_W8I2M0MC8_R0.8_T1O1
0.9526
11.44
3.93
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T1O1
0.9360
10.66
4.22
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T1O1
0.9285
11.42
3.94
651.90
5.72
F1B0_W8I4M0MC8_R0.8_T1O1
0.9088
10.56
4.26
578.69
6.44
F1B0_W16I4M0MC10_R0.8_T1O1
0.8941
10.65
4.22
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T1O1
0.8645
9.83
4.57
505.47
7.37
📖PSNR: DBCache(Slow) FnBn
Config
PSNR
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1.dev, 50 steps]
-
42.74
1.00
3726.87
1.00
F8B0_W8M0MC0_R0.08_T0O0
35.2008
27.87
1.53
2162.19
1.72
F8B0_W8M0MC2_R0.12_T0O0
34.7449
27.22
1.57
2072.18
1.80
F8B0_W8M0MC2_R0.16_T0O0
34.6659
26.34
1.62
2002.67
1.86
F8B0_W8M0MC2_R0.2_T0O0
34.6579
26.33
1.62
1994.67
1.87
F8B0_W8M0MC2_R0.24_T0O0
34.5399
25.62
1.67
1933.17
1.93
F8B0_W8M0MC2_R0.32_T0O0
34.4860
25.66
1.67
1933.17
1.93
F1B0_W4M0MC0_R0.08_T0O0
33.9639
23.21
1.84
1729.60
2.15
F1B0_W4M0MC2_R0.12_T0O0
33.1898
21.92
1.95
1604.78
2.32
F1B0_W4M0MC3_R0.12_T0O0
33.0037
20.61
2.07
1457.62
2.56
F1B0_W4M0MC4_R0.12_T0O0
32.9462
20.01
2.14
1401.61
2.66
F1B0_W4M0MC3_R0.16_T0O0
31.9664
19.09
2.24
1310.82
2.84
F1B0_W4M0MC3_R0.2_T0O0
31.9315
18.57
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T0O0
31.7174
17.75
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T0O0
31.7109
17.71
2.41
1153.05
3.23
F1B0_W4M0MC4_R0.24_T0O0
31.3081
16.98
2.52
1091.18
3.42
F1B0_W4M0MC6_R0.24_T0O0
31.0524
15.54
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T0O0
30.6637
15.65
2.73
942.92
3.95
📖PSNR: DBCache(Slow) FnBn + TaylorSeer O(1)
Config
PSNR
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
-
42.74
1.00
3726.87
1.00
F8B0_W8M0MC0_R0.08_T1O1
36.7296
28.06
1.53
2172.76
1.72
F8B0_W8M0MC2_R0.12_T1O1
36.1337
27.31
1.57
2074.10
1.80
F8B0_W8M0MC2_R0.16_T1O1
36.1062
26.46
1.62
2005.56
1.86
F8B0_W8M0MC2_R0.2_T1O1
36.0838
26.22
1.64
1985.38
1.88
F8B0_W8M0MC2_R0.24_T1O1
35.9808
25.64
1.67
1933.17
1.93
F8B0_W8M0MC2_R0.32_T1O1
35.9008
25.71
1.67
1933.17
1.93
F8B0_W8M0MC4_R0.12_T1O1
35.4352
25.59
1.68
1912.99
1.95
F8B0_W8M0MC3_R0.16_T1O1
34.7351
24.67
1.74
1815.62
2.05
F8B0_W8M0MC3_R0.2_T1O1
34.6739
24.46
1.75
1794.16
2.08
F8B0_W8M0MC3_R0.24_T1O1
34.6091
23.84
1.80
1740.99
2.14
F8B0_W8M0MC3_R0.32_T1O1
34.5803
23.98
1.79
1740.99
2.14
F1B0_W4M0MC0_R0.08_T1O1
34.5305
23.29
1.84
1730.70
2.15
F1B0_W4M0MC2_R0.12_T1O1
34.1546
22.07
1.94
1604.04
2.32
F1B0_W4M0MC4_R0.12_T1O1
33.8600
20.08
2.14
1404.17
2.65
F1B0_W4M0MC3_R0.16_T1O1
31.8628
19.20
2.24
1310.82
2.84
F1B0_W4M0MC3_R0.2_T1O1
31.8328
18.63
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T1O1
31.6688
17.82
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T1O1
31.6624
17.71
2.42
1153.05
3.23
F1B0_W4M0MC4_R0.24_T1O1
30.8172
17.04
2.52
1091.18
3.42
F1B0_W4M0MC6_R0.24_T1O1
30.5131
15.61
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T1O1
29.8860
15.70
2.73
944.02
3.95
📖PSNR: DBCache(Fast) FnBn
Config
PSNR
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
F1B0_W8I2M0MC8_R0.8_T0O0
29.3428
11.40
3.91
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T0O0
29.2732
10.62
4.20
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T0O0
28.5724
11.26
3.96
651.90
5.72
F1B0_W16I4M0MC10_R0.8_T0O0
28.5644
10.48
4.26
578.69
6.44
F1B0_W8I4M0MC8_R0.8_T0O0
28.5440
10.46
4.26
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T0O0
28.5201
9.71
4.59
505.47
7.37
📖PSNR: DBCache(Fast) FnBn + TaylorSeer O(1)
Config
PSNR
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
F1B0_W8I2M0MC8_R0.8_T1O1
29.2251
11.44
3.93
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T1O1
28.9526
10.66
4.22
578.69
6.44
F1B0_W8I4M0MC8_R0.8_T1O1
28.7110
10.56
4.26
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T1O1
28.6300
11.42
3.94
651.90
5.72
F1B0_W16I4M0MC10_R0.8_T1O1
28.5842
10.65
4.22
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T1O1
28.5654
9.83
4.57
505.47
7.37
📖SSIM: DBCache(Slow) FnBn
Config
SSIM
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
-
42.74
1.00
3726.87
1.00
F8B0_W8M0MC0_R0.08_T0O0
0.9131
27.87
1.53
2162.19
1.72
F8B0_W8M0MC2_R0.12_T0O0
0.9017
27.22
1.57
2072.18
1.80
F8B0_W8M0MC2_R0.16_T0O0
0.8999
26.34
1.62
2002.67
1.86
F8B0_W8M0MC2_R0.2_T0O0
0.8996
26.33
1.62
1994.67
1.87
F8B0_W8M0MC2_R0.24_T0O0
0.8953
25.62
1.67
1933.17
1.93
F8B0_W8M0MC2_R0.32_T0O0
0.8936
25.66
1.67
1933.17
1.93
F8B0_W8M0MC4_R0.12_T0O0
0.8858
25.41
1.68
1897.61
1.96
F8B0_W8M0MC3_R0.16_T0O0
0.8774
24.55
1.74
1811.77
2.06
F8B0_W8M0MC3_R0.2_T0O0
0.8761
24.56
1.74
1803.45
2.07
F1B0_W4M0MC0_R0.08_T0O0
0.8727
23.21
1.84
1729.60
2.15
F8B0_W8M0MC4_R0.2_T0O0
0.8571
23.40
1.83
1678.21
2.22
F1B0_W4M0MC2_R0.12_T0O0
0.8536
21.92
1.95
1604.78
2.32
F1B0_W4M0MC3_R0.12_T0O0
0.8521
20.61
2.07
1457.62
2.56
F1B0_W4M0MC4_R0.12_T0O0
0.8461
20.01
2.14
1401.61
2.66
F1B0_W4M0MC3_R0.16_T0O0
0.8092
19.09
2.24
1310.82
2.84
F1B0_W4M0MC3_R0.2_T0O0
0.8053
18.57
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T0O0
0.7964
17.75
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T0O0
0.7952
17.71
2.41
1153.05
3.23
F1B0_W4M0MC6_R0.24_T0O0
0.7620
15.54
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T0O0
0.7438
15.65
2.73
942.92
3.95
📖SSIM: DBCache(Slow) FnBn + TaylorSeer O(1)
Config
SSIM
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
-
42.74
1.00
3726.87
1.00
F8B0_W8M0MC0_R0.08_T1O1
0.9340
28.06
1.53
2172.76
1.72
F8B0_W8M0MC2_R0.12_T1O1
0.9201
27.31
1.57
2074.10
1.80
F8B0_W8M0MC2_R0.16_T1O1
0.9194
26.46
1.62
2005.56
1.86
F8B0_W8M0MC2_R0.2_T1O1
0.9189
26.22
1.64
1985.38
1.88
F8B0_W8M0MC2_R0.24_T1O1
0.9169
25.64
1.67
1933.17
1.93
F8B0_W8M0MC2_R0.32_T1O1
0.9157
25.71
1.67
1933.17
1.93
F8B0_W8M0MC3_R0.12_T1O1
0.9116
26.09
1.65
1974.49
1.89
F8B0_W8M0MC4_R0.12_T1O1
0.9098
25.59
1.68
1912.99
1.95
F8B0_W8M0MC3_R0.16_T1O1
0.8953
24.67
1.74
1815.62
2.05
F8B0_W8M0MC3_R0.2_T1O1
0.8926
24.46
1.75
1794.16
2.08
F8B0_W8M0MC3_R0.24_T1O1
0.8907
23.84
1.80
1740.99
2.14
F8B0_W8M0MC3_R0.32_T1O1
0.8905
23.98
1.79
1740.99
2.14
F1B0_W4M0MC0_R0.08_T1O1
0.8845
23.29
1.84
1730.70
2.15
F1B0_W4M0MC2_R0.12_T1O1
0.8802
22.07
1.94
1604.04
2.32
F1B0_W4M0MC3_R0.12_T1O1
0.8752
20.70
2.07
1457.62
2.56
F1B0_W4M0MC4_R0.12_T1O1
0.8745
20.08
2.14
1404.17
2.65
F1B0_W4M0MC3_R0.16_T1O1
0.8222
19.20
2.24
1310.82
2.84
F1B0_W4M0MC3_R0.2_T1O1
0.8198
18.63
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T1O1
0.8125
17.82
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T1O1
0.8119
17.71
2.42
1153.05
3.23
F1B0_W4M0MC4_R0.24_T1O1
0.7837
17.04
2.52
1091.18
3.42
F1B0_W4M0MC6_R0.24_T1O1
0.7718
15.61
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T1O1
0.7423
15.70
2.73
944.02
3.95
📖SSIM: DBCache(Fast) FnBn
Config
SSIM
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
F1B0_W8I2M0MC8_R0.8_T0O0
0.6681
11.40
3.91
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T0O0
0.6528
10.62
4.20
578.69
6.44
F1B0_W8I4M0MC8_R0.8_T0O0
0.6161
10.46
4.26
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T0O0
0.6066
11.26
3.96
651.90
5.72
F1B0_W8I4M0MC10_R0.8_T0O0
0.6065
9.71
4.59
505.47
7.37
F1B0_W16I4M0MC10_R0.8_T0O0
0.5977
10.48
4.26
578.69
6.44
📖SSIM: DBCache(Fast) FnBn + TaylorSeer O(1)
Config
SSIM
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
F1B0_W8I2M0MC8_R0.8_T1O1
0.6653
11.44
3.93
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T1O1
0.6219
10.66
4.22
578.69
6.44
F1B0_W8I4M0MC8_R0.8_T1O1
0.6194
10.56
4.26
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T1O1
0.6076
11.42
3.94
651.90
5.72
F1B0_W8I4M0MC10_R0.8_T1O1
0.5947
9.83
4.57
505.47
7.37
F1B0_W16I4M0MC10_R0.8_T1O1
0.5944
10.65
4.22
578.69
6.44
📖LPIPS: DBCache(Slow) FnBn
Config
LPIPS
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
-
42.74
1.00
3726.87
1.00
F8B0_W8M0MC0_R0.08_T0O0
0.0786
27.87
1.53
2162.19
1.72
F8B0_W8M0MC2_R0.12_T0O0
0.0895
27.22
1.57
2072.18
1.80
F8B0_W8M0MC2_R0.16_T0O0
0.0919
26.34
1.62
2002.67
1.86
F8B0_W8M0MC2_R0.2_T0O0
0.0923
26.33
1.62
1994.67
1.87
F8B0_W8M0MC2_R0.24_T0O0
0.0963
25.62
1.67
1933.17
1.93
F8B0_W8M0MC2_R0.32_T0O0
0.0990
25.66
1.67
1933.17
1.93
F8B0_W8M0MC4_R0.12_T0O0
0.1085
25.41
1.68
1897.61
1.96
F8B0_W8M0MC3_R0.16_T0O0
0.1182
24.55
1.74
1811.77
2.06
F1B0_W4M0MC0_R0.08_T0O0
0.1196
23.21
1.84
1729.60
2.15
F1B0_W4M0MC2_R0.12_T0O0
0.1390
21.92
1.95
1604.78
2.32
F1B0_W4M0MC3_R0.12_T0O0
0.1426
20.61
2.07
1457.62
2.56
F1B0_W4M0MC4_R0.12_T0O0
0.1500
20.01
2.14
1401.61
2.66
F1B0_W4M0MC3_R0.16_T0O0
0.1909
19.09
2.24
1310.82
2.84
F1B0_W4M0MC3_R0.2_T0O0
0.1942
18.57
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T0O0
0.2082
17.75
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T0O0
0.2094
17.71
2.41
1153.05
3.23
F1B0_W4M0MC4_R0.24_T0O0
0.2289
16.98
2.52
1091.18
3.42
F1B0_W4M0MC6_R0.24_T0O0
0.2553
15.54
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T0O0
0.2826
15.65
2.73
942.92
3.95
📖LPIPS: DBCache(Slow) FnBn + TaylorSeer O(1)
Config
LPIPS
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
Baseline: [FLUX.1 50 steps]
-
42.74
1.00
3726.87
1.00
F8B0_W8M0MC0_R0.08_T1O1
0.0558
28.06
1.53
2172.76
1.72
F8B0_W8M0MC2_R0.12_T1O1
0.0687
27.31
1.57
2074.10
1.80
F8B0_W8M0MC2_R0.16_T1O1
0.0690
26.46
1.62
2005.56
1.86
F8B0_W8M0MC2_R0.2_T1O1
0.0692
26.22
1.64
1985.38
1.88
F8B0_W8M0MC2_R0.24_T1O1
0.0699
25.64
1.67
1933.17
1.93
F8B0_W8M0MC2_R0.32_T1O1
0.0714
25.71
1.67
1933.17
1.93
F8B0_W8M0MC3_R0.12_T1O1
0.0772
26.09
1.65
1974.49
1.89
F8B0_W8M0MC4_R0.12_T1O1
0.0792
25.59
1.68
1912.99
1.95
F8B0_W8M0MC3_R0.16_T1O1
0.0933
24.67
1.74
1815.62
2.05
F8B0_W8M0MC3_R0.2_T1O1
0.0958
24.46
1.75
1794.16
2.08
F8B0_W8M0MC3_R0.24_T1O1
0.0964
23.84
1.80
1740.99
2.14
F8B0_W8M0MC3_R0.32_T1O1
0.0974
23.98
1.79
1740.99
2.14
F1B0_W4M0MC0_R0.08_T1O1
0.1032
23.29
1.84
1730.70
2.15
F1B0_W4M0MC2_R0.12_T1O1
0.1091
22.07
1.94
1604.04
2.32
F1B0_W4M0MC3_R0.12_T1O1
0.1147
20.70
2.07
1457.62
2.56
F1B0_W4M0MC4_R0.12_T1O1
0.1154
20.08
2.14
1404.17
2.65
F1B0_W4M0MC3_R0.2_T1O1
0.1733
18.63
2.30
1237.61
3.01
F1B0_W4M0MC4_R0.16_T1O1
0.1837
17.82
2.41
1164.76
3.20
F1B0_W4M0MC4_R0.2_T1O1
0.1839
17.71
2.42
1153.05
3.23
F1B0_W4M0MC4_R0.24_T1O1
0.2155
17.04
2.52
1091.18
3.42
F1B0_W4M0MC6_R0.24_T1O1
0.2310
15.61
2.75
944.75
3.94
F1B0_W4M0MC6_R0.32_T1O1
0.2677
15.70
2.73
944.02
3.95
📖LPIPS: DBCache(Fast) FnBn
Config
LPIPS
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
F1B0_W8I2M0MC8_R0.8_T0O0
0.3899
11.40
3.91
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T0O0
0.4180
10.62
4.20
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T0O0
0.4703
11.26
3.96
651.90
5.72
F1B0_W8I4M0MC8_R0.8_T0O0
0.4829
10.46
4.26
578.69
6.44
F1B0_W16I4M0MC10_R0.8_T0O0
0.4850
10.48
4.26
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T0O0
0.5024
9.71
4.59
505.47
7.37
📖LPIPS: DBCache(Fast) FnBn + TaylorSeer O(1)
Config
LPIPS
Latency(s)
SpeedUp(↑)
TFLOPs
SpeedUp(↑)
F1B0_W8I2M0MC8_R0.8_T1O1
0.3620
11.44
3.93
651.90
5.72
F1B0_W8I2M0MC10_R0.8_T1O1
0.4248
10.66
4.22
578.69
6.44
F1B0_W8I4M0MC8_R0.8_T1O1
0.4463
10.56
4.26
578.69
6.44
F1B0_W16I4M0MC8_R0.8_T1O1
0.4478
11.42
3.94
651.90
5.72
F1B0_W16I4M0MC10_R0.8_T1O1
0.4662
10.65
4.22
578.69
6.44
F1B0_W8I4M0MC10_R0.8_T1O1
0.4855
9.83
4.57
505.47
7.37