Skip to content

Commit 557a08d

Browse files
committed
Improve transformer doc [skip ci]
1 parent bc4fac9 commit 557a08d

File tree

1 file changed

+8
-0
lines changed
  • bayesflow/networks/transformers

1 file changed

+8
-0
lines changed

bayesflow/networks/transformers/mab.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,17 @@
1010
class MultiHeadAttentionBlock(keras.Layer):
1111
"""Implements the MAB block from [1] which represents learnable cross-attention.
1212
13+
In particular, it uses a so-called "Post-LN" transformer block [2] which applies
14+
layer norm following attention and following MLP. A "Pre-LN" transformer block
15+
can easily be implemented.
16+
1317
[1] Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., & Teh, Y. W. (2019).
1418
Set transformer: A framework for attention-based permutation-invariant neural networks.
1519
In International conference on machine learning (pp. 3744-3753). PMLR.
20+
21+
[2] Xiong, R., Yang, Y., He, D., Zheng, K., Zheng, S., Xing, C., ... & Liu, T. (2020, November).
22+
On layer normalization in the transformer architecture.
23+
In International conference on machine learning (pp. 10524-10533). PMLR.
1624
"""
1725

1826
def __init__(

0 commit comments

Comments
 (0)