Network differs for consecutive calls of Conformer Module.

When starting a fresh training the network construction already runs twice because the network apparently differs:

The diff is:
```python
dict diff:
['encoder'] dict diff:
['encoder'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape'] ['extra_deps'] list diff len: len self: 1, len other: 2
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape_0'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['RelPosSelfAttention._rel_shift'] ['subnetwork'] ['reshape_0'] ['extra_deps'] list diff len: len self: 1, len other: 2
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] ['dot'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] ['dot'] ['from'] list diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['linear_pos'] ['subnetwork'] ['dot'] ['from'] [0] self: 'base:relative_positional_encoding' != other: 'base:relative_positional_encoding/sin'
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] dict diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] ['from'] self: 'sin' != other: 'concat'
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] ['out_shape'] set diff:
['encoder'] ['subnetwork'] ['layers'] ['subnetwork'] ['0'] ['subnetwork'] ['self_att'] ['subnetwork'] ['relative_positional_encoding'] ['subnetwork'] ['output'] ['out_shape']   Dim{F'conformer-enc-default-out-dim'(512)} not in other
```

The network code can be found under:
https://github.com/rwth-i6/i6_experiments/blob/main/users/rossenbach/experiments/librispeech/librispeech_100_attention/rc_conformer_2023/rc_networks/conformer_aed_trial.py

The network is constructed via:
```python
def get_network(epoch, **kwargs):
    nn.reset_default_root_name_ctx()
    net = construct_network(epoch=epoch, **network_kwargs)
    return nn.get_returnn_config().get_net_dict_raw_dict(net)
```

But within the `construct_network` epoch is not used:
```python
def construct_network(
        epoch: int,
        audio_features: nn.Data,
        bpe_labels: nn.Data,
        **kwargs
):
    net = ConformerAEDModel(
        bpe_size=bpe_labels.sparse_dim,
        audio_feature_dim=audio_features.dim_tags[audio_features.feature_dim_axis],
        **kwargs
    )

    out = net(
        audio_features=nn.get_extern_data(audio_features),
        audio_time=audio_features.dim_tags[audio_features.time_dim_axis],
        bpe_labels=nn.get_extern_data(bpe_labels),
        bpe_time=bpe_labels.dim_tags[bpe_labels.time_dim_axis]
    )
    out.mark_as_default_output()

    return net
```

The full log can be found under:
https://gist.github.com/JackTemaki/bc24ac9d5ced81c823a0b94fa0871720


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Network differs for consecutive calls of Conformer Module. #257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Network differs for consecutive calls of Conformer Module. #257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions