This repository was archived by the owner on Jul 10, 2025. It is now read-only.
File tree Expand file tree Collapse file tree 1 file changed +6
-2
lines changed Expand file tree Collapse file tree 1 file changed +6
-2
lines changed Original file line number Diff line number Diff line change @@ -231,8 +231,9 @@ expension logic and multi-axes softmax will be handled locally in
231231
232232* Keras Dense Attention
233233
234- [ tf.keras.layers.Attention] ( https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention )
235- layer call method takes an optional argument, ` mask ` , which requires two
234+ We have two changes proposed to
235+ [ tf.keras.layers.Attention] ( https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention ) .
236+ (1) The layer call method takes an optional argument, ` mask ` , which requires two
236237tensors, ` q_mask ` and ` v_mask ` . They are following keras framework requirements
237238with (batch_size, target_length) and (batch_size, source_length) as shapes. This
238239limits the flexibility of masking and ` MultiHeadAttention ` layer generalize the
@@ -241,6 +242,9 @@ we would like to introduce an optional argument `attention_mask` for
241242` tf.keras.layers.Attention ` . In the reduced case of ` tf.keras.layers.Attention ` ,
242243the shape is (batch_size, target_length, source_length). Whenever
243244` attention_mask ` is specified, the ` mask ` argument is OK to be skipped.
245+ (2) The layer does not return attention scores. We will add the bool argument,
246+ ` return_attention_scores ` to the __ init__ and return the attention score tensor if
247+ it is true.
244248
245249* TFA ` MultiHeadAttention ` Deprecation and Re-mapping
246250
You can’t perform that action at this time.
0 commit comments