Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion i6_models/assemblies/transformer/transformer_decoder_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,8 @@ class TransformerDecoderV1Config(ModelConfiguration):
block_cfg: Configuration for TransformerDecoderV1.
input_dropout: Dropout applied to the input embedding.
input_embedding_scale: Scale applied to the input embedding.
Set to `None` to apply a (tuned) default.
Set to `None` to apply a default that is suitable for ASR AED decoder models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mention any specific model at all here. I think this is just confusing. I would instead just say what default you use.

When training a pure LM, scale 1.0 may be a better choice.
num_blocks: Number of transformer blocks in the decoder.
num_output: Number of output labels/vocab dim.
logits_bias: Whether to add a bias to the output logits.
Expand Down