You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/diffusers/hooks/taylorseer_cache.py
+8-18Lines changed: 8 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -44,24 +44,14 @@ class TaylorSeerCacheConfig:
44
44
See: https://huggingface.co/papers/2503.06923
45
45
46
46
Attributes:
47
-
warmup_steps (int, defaults to 3): Number of warmup steps without caching.
48
-
predict_steps (int, defaults to 5): Number of prediction (cache) steps between non-cached steps.
49
-
stop_predicts (Optional[int], defaults to None): Step after which predictions are stopped and full computation is always performed.
50
-
max_order (int, defaults to 1): Maximum order of Taylor series expansion to approximate the features.
51
-
taylor_factors_dtype (torch.dtype, defaults to torch.float32): Data type for Taylor series expansion factors.
52
-
architecture (str, defaults to None): Architecture for which the cache is applied. If we know the architecture, we can use the special cache identifiers.
53
-
skip_identifiers (List[str], defaults to []): Identifiers for modules to skip computation.
54
-
cache_identifiers (List[str], defaults to []): Identifiers for modules to cache.
55
-
56
-
By default, this approximation can be applied to all attention modules, but in some architectures, where the outputs of attention modules are not used for any residual computation, we can skip this attention cache step, so we have to identify the next modules to cache.
attn_output = self.attention(x) # mark this attention module to skip computation
62
-
ffn_output = self.ffn(attn_output) # ffn_output will be cached
63
-
return ffn_output
64
-
```
47
+
warmup_steps (`int`, defaults to `3`): Calculate normal computations `N` times before applying this caching strategy. Higher `N` gives more closed outputs.
48
+
predict_steps (`int`, defaults to `5`): Calculate the module states every `N` iterations. If this is set to `N`, the module computation will be skipped `N - 1` times before computing the new module states again.
49
+
stop_predicts (`int`, *optional*, defaults to `None`): Disable caching strategy after this step, this feature helps produce fine-grained outputs. If not provided, the caching strategy will be applied until the end of the inference.
50
+
max_order (`int`, defaults to `1`): Maximum order of Taylor series expansion to approximate the features. In theory, the higher the order, the more closed the output is to the actual value but also the more computation is required.
51
+
taylor_factors_dtype (`torch.dtype`, defaults to `torch.float32`): Data type for calculating Taylor series expansion factors.
52
+
architecture (`str`, *optional*, defaults to `None`): Option to use cache strategy optimized for specific architectures. By default, this cache strategy will be applied to all `Attention` modules.
53
+
skip_identifiers (`List[str]`, *optional*, defaults to `[]`): Regex patterns to identify modules to skip computation.
54
+
cache_identifiers (`List[str]`, *optional*, defaults to `[]`): Regex patterns to identify modules to cache.
0 commit comments