Skip to content

Conversation

@harshaljanjani
Copy link
Contributor

What does this PR do?

The following issues were identified and fixed in this PR:

  1. SwitchTransformersConfig incorrectly created sparse (MoE) layers when num_sparse_encoder_layers=0 and num_layers=1 due to prev logic. The modeling code’s sparse_step == 1 condition would then trigger, creating a sparse layer when none were requested. This is fixed by setting encoder_sparse_step = 0 and decoder_sparse_step = 0 when no sparse layers are requested, which the modeling code already handles correctly via if sparse_step > 0 else False.
  2. Updated an outdated comment in audio_utils.py that stated spectrogram() does not support batching, even though spectrogram_batch() already exists. Changed to a note referencing the batch function.

Fixes #43335.

Before submitting

cc: @Rocketknight1

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: switch_transformers

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43336&sha=372d5d

@harshaljanjani harshaljanjani marked this pull request as ready for review January 17, 2026 13:48
@harshaljanjani
Copy link
Contributor Author

The failing tests are unrelated to this change; would appreciate a review when you get a chance thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] SwitchTransformersConfig creates sparse layer when num_sparse_encoder_layers=0 with single layer model

1 participant