upgrade cache-dit api #21
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
upgrade cache-dit api to 1.0.x
python run_benchmark.py \ --ckpt ${CKPT} \ --trace-file optimized_cache_dit.json.gz \ --compile_export_mode compile \ --disable_fa3 \ --num_inference_steps 28 \ --cache_dit_config cache_config.yaml \ --output-file optimized_cache_dit.png --disable_quant Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 40.76it/s] Loading pipeline components...: 71%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 5/7 [00:00<00:00, 14.82it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 78.88it/s] Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 10.54it/s] INFO 10-11 06:19:58 [cache_adapter.py:46] FluxPipeline is officially supported by cache-dit. Use it's pre-defined BlockAdapter directly! INFO 10-11 06:19:58 [block_adapters.py:201] Auto fill blocks_name: ['transformer_blocks', 'single_transformer_blocks']. INFO 10-11 06:19:58 [block_adapters.py:482] Match Block Forward Pattern: FluxTransformerBlock, ForwardPattern.Pattern_1 INFO 10-11 06:19:58 [block_adapters.py:482] IN:('hidden_states', 'encoder_hidden_states'), OUT:('encoder_hidden_states', 'hidden_states')) INFO 10-11 06:19:58 [block_adapters.py:482] Match Block Forward Pattern: FluxSingleTransformerBlock, ForwardPattern.Pattern_1 INFO 10-11 06:19:58 [block_adapters.py:482] IN:('hidden_states', 'encoder_hidden_states'), OUT:('encoder_hidden_states', 'hidden_states')) INFO 10-11 06:19:58 [cache_adapter.py:141] Use default 'enable_separate_cfg' from block adapter register: False, Pipeline: FluxPipeline. INFO 10-11 06:19:58 [cache_adapter.py:275] Collected Cache Config: DBCACHE_F1B0_W0M0MC2_R0.3, Calibrator Config: TaylorSeer_O(2) INFO 10-11 06:19:58 [cache_adapter.py:275] Collected Cache Config: DBCACHE_F1B0_W0M0MC2_R0.3, Calibrator Config: TaylorSeer_O(2) INFO 10-11 06:19:58 [pattern_base.py:51] Match Cached Blocks: CachedBlocks_Pattern_0_1_2, for transformer_blocks, cache_context: transformer_blocks_140567643384928, cache_manager: FluxPipeline_140567649235936. INFO 10-11 06:19:58 [pattern_base.py:51] Match Cached Blocks: CachedBlocks_Pattern_0_1_2, for single_transformer_blocks, cache_context: single_transformer_blocks_140567528265904, cache_manager: FluxPipeline_140567649235936. time mean/var: tensor([13.3521, 13.3886, 13.4104, 13.4335, 13.4646, 13.4767, 13.4747, 13.4828, 13.4852, 13.4868]) 13.445541381835938 0.0022438988089561462 🤗Cache Options: FluxSingleTransformerBlock {'cache_config': BasicCacheConfig(Fn_compute_blocks=1, Bn_compute_blocks=0, residual_diff_threshold=0.3, max_warmup_steps=0, max_cached_steps=-1, max_continuous_cached_steps=2, enable_separate_cfg=False, cfg_compute_first=False, cfg_diff_compute_separate=True), 'calibrator_config': TaylorSeerCalibratorConfig(enable_calibrator=True, enable_encoder_calibrator=True, calibrator_type='taylorseer', calibrator_cache_type='residual', calibrator_kwargs={}, taylorseer_order=2), 'name': 'single_transformer_blocks_140567528265904'} ⚡️Cache Steps and Residual Diffs Statistics: FluxSingleTransformerBlock | Cache Steps | Diffs P00 | Diffs P25 | Diffs P50 | Diffs P75 | Diffs P95 | Diffs Min | Diffs Max | |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | 7 | 0.117 | 0.304 | 0.523 | 0.6 | 0.769 | 0.117 | 0.793 | 🤗Cache Options: FluxTransformerBlock {'cache_config': BasicCacheConfig(Fn_compute_blocks=1, Bn_compute_blocks=0, residual_diff_threshold=0.3, max_warmup_steps=0, max_cached_steps=-1, max_continuous_cached_steps=2, enable_separate_cfg=False, cfg_compute_first=False, cfg_diff_compute_separate=True), 'calibrator_config': TaylorSeerCalibratorConfig(enable_calibrator=True, enable_encoder_calibrator=True, calibrator_type='taylorseer', calibrator_cache_type='residual', calibrator_kwargs={}, taylorseer_order=2), 'name': 'transformer_blocks_140567643384928'} ⚡️Cache Steps and Residual Diffs Statistics: FluxTransformerBlock | Cache Steps | Diffs P00 | Diffs P25 | Diffs P50 | Diffs P75 | Diffs P95 | Diffs Min | Diffs Max | |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | 17 | 0.034 | 0.064 | 0.111 | 0.163 | 0.312 | 0.034 | 0.32 |