You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This function returns necessary arguments to call `flash_attn_varlen_func`.
139
139
All three query, key, value states will be flattened.
140
-
Cummulative lengths of each examples in the batch will be extracted from position_ids.
140
+
Cumulative lengths of each examples in the batch will be extracted from position_ids.
141
141
142
-
NOTE: ideally cummulative lengths should be prepared at the data collator stage
142
+
NOTE: ideally cumulative lengths should be prepared at the data collator stage
143
143
144
144
Arguments:
145
145
query (`torch.Tensor`):
@@ -268,7 +268,7 @@ def _flash_attention_forward(
268
268
softmax_scale (`float`, *optional*):
269
269
The scaling of QK^T before applying softmax. Default to 1 / sqrt(head_dim)
270
270
use_top_left_mask (`bool`, defaults to `False`):
271
-
flash_attn<2.1 generates top-left aligned causal mask, while what is needed here is bottom-right alignement, that was made default for flash_attn>=2.1. This attribute is used to handle this difference.
271
+
flash_attn<2.1 generates top-left aligned causal mask, while what is needed here is bottom-right alignment, that was made default for flash_attn>=2.1. This attribute is used to handle this difference.
272
272
softcap (`float`, *optional*):
273
273
Softcap for the attention logits, used e.g. in gemma2.
274
274
deterministic (`bool`, *optional*):
@@ -374,9 +374,9 @@ class FlashAttentionKwargs(TypedDict, total=False):
3. SDPA implementation, if available and supported by the model type. (`LlamaSdpaAttention` for example)
1617
1617
4. The default model's implementation otherwise (`LlamaAttention` for example) .
1618
1618
"""
1619
-
# Here we use config._attn_implementation_internal to check whether the attention implementation was explicitely set by the user.
1619
+
# Here we use config._attn_implementation_internal to check whether the attention implementation was explicitly set by the user.
1620
1620
# The property `PretrainedConfig._attn_implementation` is never `None`, for backward compatibility (always fall back on "eager").
1621
1621
# The `hasattr` here is used as some Transformers tests for some reason do not call PretrainedConfig __init__ (e.g. test_no_super_init_config_and_model)
0 commit comments