Skip to content

Commit acfebfa

Browse files
author
toilaluan
committed
update docs
1 parent 7238d40 commit acfebfa

File tree

1 file changed

+8
-18
lines changed

1 file changed

+8
-18
lines changed

src/diffusers/hooks/taylorseer_cache.py

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -44,24 +44,14 @@ class TaylorSeerCacheConfig:
4444
See: https://huggingface.co/papers/2503.06923
4545
4646
Attributes:
47-
warmup_steps (int, defaults to 3): Number of warmup steps without caching.
48-
predict_steps (int, defaults to 5): Number of prediction (cache) steps between non-cached steps.
49-
stop_predicts (Optional[int], defaults to None): Step after which predictions are stopped and full computation is always performed.
50-
max_order (int, defaults to 1): Maximum order of Taylor series expansion to approximate the features.
51-
taylor_factors_dtype (torch.dtype, defaults to torch.float32): Data type for Taylor series expansion factors.
52-
architecture (str, defaults to None): Architecture for which the cache is applied. If we know the architecture, we can use the special cache identifiers.
53-
skip_identifiers (List[str], defaults to []): Identifiers for modules to skip computation.
54-
cache_identifiers (List[str], defaults to []): Identifiers for modules to cache.
55-
56-
By default, this approximation can be applied to all attention modules, but in some architectures, where the outputs of attention modules are not used for any residual computation, we can skip this attention cache step, so we have to identify the next modules to cache.
57-
Example:
58-
```python
59-
...
60-
def forward(self, x: torch.Tensor) -> torch.Tensor:
61-
attn_output = self.attention(x) # mark this attention module to skip computation
62-
ffn_output = self.ffn(attn_output) # ffn_output will be cached
63-
return ffn_output
64-
```
47+
warmup_steps (`int`, defaults to `3`): Calculate normal computations `N` times before applying this caching strategy. Higher `N` gives more closed outputs.
48+
predict_steps (`int`, defaults to `5`): Calculate the module states every `N` iterations. If this is set to `N`, the module computation will be skipped `N - 1` times before computing the new module states again.
49+
stop_predicts (`int`, *optional*, defaults to `None`): Disable caching strategy after this step, this feature helps produce fine-grained outputs. If not provided, the caching strategy will be applied until the end of the inference.
50+
max_order (`int`, defaults to `1`): Maximum order of Taylor series expansion to approximate the features. In theory, the higher the order, the more closed the output is to the actual value but also the more computation is required.
51+
taylor_factors_dtype (`torch.dtype`, defaults to `torch.float32`): Data type for calculating Taylor series expansion factors.
52+
architecture (`str`, *optional*, defaults to `None`): Option to use cache strategy optimized for specific architectures. By default, this cache strategy will be applied to all `Attention` modules.
53+
skip_identifiers (`List[str]`, *optional*, defaults to `[]`): Regex patterns to identify modules to skip computation.
54+
cache_identifiers (`List[str]`, *optional*, defaults to `[]`): Regex patterns to identify modules to cache.
6555
"""
6656

6757
warmup_steps: int = 3

0 commit comments

Comments
 (0)