Skip to content

Network differs for consecutive calls of Conformer Module. #257

@JackTemaki

Description

@JackTemaki

When starting a fresh training the network construction already runs twice because the network apparently differs:

The diff is:

dict diff:
['encoder'] dict diff:
['encoder'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape'] ['extra_deps'] list diff len: len self: 1, len other: 2
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape_0'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape_0'] ['extra_deps'] list diff len: len self: 1, len other: 2
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] ['dot'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] ['dot'] ['from'] list diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] ['dot'] ['from'] [0] self: 'base:relative_positional_encoding' != other: 'base:relative_positional_encoding/sin'
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] ['from'] self: 'sin' != other: 'concat'
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] ['out_shape'] set diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] ['out_shape']   Dim{F'conformer-enc-default-out-dim'(512)} not in other

The network code can be found under:
https://github.com/rwth-i6/i6_experiments/blob/main/users/rossenbach/experiments/librispeech/librispeech_100_attention/rc_conformer_2023/rc_networks/conformer_aed_trial.py

The network is constructed via:

def get_network(epoch, **kwargs):
    nn.reset_default_root_name_ctx()
    net = construct_network(epoch=epoch, **network_kwargs)
    return nn.get_returnn_config().get_net_dict_raw_dict(net)

But within the construct_network epoch is not used:

def construct_network(
        epoch: int,
        audio_features: nn.Data,
        bpe_labels: nn.Data,
        **kwargs
):
    net = ConformerAEDModel(
        bpe_size=bpe_labels.sparse_dim,
        audio_feature_dim=audio_features.dim_tags[audio_features.feature_dim_axis],
        **kwargs
    )

    out = net(
        audio_features=nn.get_extern_data(audio_features),
        audio_time=audio_features.dim_tags[audio_features.time_dim_axis],
        bpe_labels=nn.get_extern_data(bpe_labels),
        bpe_time=bpe_labels.dim_tags[bpe_labels.time_dim_axis]
    )
    out.mark_as_default_output()

    return net

The full log can be found under:
https://gist.github.com/JackTemaki/bc24ac9d5ced81c823a0b94fa0871720

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions