Skip to content

Conformer: Missing dropout after the self-attention #245

@albertz

Description

@albertz

I noticed that we do not have dropout after the self-attention:

    # MHSA
    x_mhsa_ln = self.self_att_layer_norm(x_ffn1_out)
    x_mhsa = self.self_att(x_mhsa_ln, axis=spatial_dim)
    x_mhsa_out = x_mhsa + x_ffn1_out

This is different to the standard Transformer.
This is also different to the paper.

Originally posted by @albertz in #233 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions