Skip to content

Missing tie_word_embeddings in config.json causes incorrect weight tying in transformers 4.54+ #38

@diodiogod

Description

@diodiogod

Issue

The config.json in the Step-Audio-EditX model is missing the tie_word_embeddings configuration key. This causes transformers 4.54+ to incorrectly tie the lm_head and embed_tokens weights together, even though they have different values in the checkpoint.

Root Cause

  • config.json does not contain "tie_word_embeddings"
  • transformers 4.54+ defaults to tie_word_embeddings=True when this key is missing
  • The model checkpoint has separate weights for lm_head and embed_tokens (different norms)
  • Tying overwrites the correct lm_head weights with embed_tokens weights
  • This causes the model to generate text tokens instead of audio tokens
  • Result: Silent/gibberish audio generation and generation ignoring max_new_tokens

Solution

Add the following line to config.json:

"tie_word_embeddings": false

This tells transformers to keep the weights separate, which matches your checkpoint structure.

Impact

This affects all users of Step-Audio-EditX with transformers 4.54+. Users have to implement workarounds to restore weights after model loading.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions