Skip to content

[BUG] SwitchTransformersConfig creates sparse layer when num_sparse_encoder_layers=0 with single layer modelΒ #43335

@harshaljanjani

Description

@harshaljanjani

System Info

  • transformers version: 5.0.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • huggingface_hub version: 1.3.2
  • safetensors version: 0.7.0
  • accelerate version: 1.12.0
  • Accelerate config: not installed
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
  • GPU type: NVIDIA GeForce RTX 4060 Laptop GPU

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import SwitchTransformersConfig, SwitchTransformersModel

config = SwitchTransformersConfig(
    num_layers=1,
    num_sparse_encoder_layers=0,
    num_decoder_layers=1,
    num_sparse_decoder_layers=0,
    vocab_size=100,
    d_model=64,
    d_ff=128,
    num_heads=4,
    d_kv=16
)
model = SwitchTransformersModel(config)
encoder_sparse_count = sum(
    1 for block in model.encoder.block if block.is_sparse
)
print(f"Encoder sparse layers: {encoder_sparse_count}")

The bug is in configuration_switch_transformers.py (lines 151 and 157). When num_sparse_encoder_layers=0, the code sets encoder_sparse_step = num_layers (marked with a HACK comment). Combined with the modeling logic in line 668 β†’ when num_layers=1, sparse_step=1 triggers the sparse_step==1 condition, and incorrectly creates a sparse layer.

Expected behavior

When num_sparse_encoder_layers=0 is set, zero sparse layers should be created regardless of num_layers value.

cc: @Rocketknight1 @sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions