Conformer: Missing dropout after the self-attention

I noticed that we do not have dropout after the self-attention:
```python
    # MHSA
    x_mhsa_ln = self.self_att_layer_norm(x_ffn1_out)
    x_mhsa = self.self_att(x_mhsa_ln, axis=spatial_dim)
    x_mhsa_out = x_mhsa + x_ffn1_out
```
This is different to the standard Transformer.
This is also different to the paper.

_Originally posted by @albertz in https://github.com/rwth-i6/returnn_common/issues/233#issuecomment-1312734282_
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conformer: Missing dropout after the self-attention #245

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Conformer: Missing dropout after the self-attention #245

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions