Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 9778f63

Browse files
authored
Update two proposed changes
Update two proposed changes to the existing Attention layer
1 parent 4ffd127 commit 9778f63

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

rfcs/20200616-keras-multihead-attention.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,8 +231,9 @@ expension logic and multi-axes softmax will be handled locally in
231231

232232
* Keras Dense Attention
233233

234-
[tf.keras.layers.Attention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention)
235-
layer call method takes an optional argument, `mask`, which requires two
234+
We have two changes proposed to
235+
[tf.keras.layers.Attention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention).
236+
(1) The layer call method takes an optional argument, `mask`, which requires two
236237
tensors, `q_mask` and `v_mask`. They are following keras framework requirements
237238
with (batch_size, target_length) and (batch_size, source_length) as shapes. This
238239
limits the flexibility of masking and `MultiHeadAttention` layer generalize the
@@ -241,6 +242,9 @@ we would like to introduce an optional argument `attention_mask` for
241242
`tf.keras.layers.Attention`. In the reduced case of `tf.keras.layers.Attention`,
242243
the shape is (batch_size, target_length, source_length). Whenever
243244
`attention_mask` is specified, the `mask` argument is OK to be skipped.
245+
(2) The layer does not return attention scores. We will add the bool argument,
246+
`return_attention_scores` to the __init__ and return the attention score tensor if
247+
it is true.
244248

245249
* TFA `MultiHeadAttention` Deprecation and Re-mapping
246250

0 commit comments

Comments
 (0)