Getting ValueError: Trying to set a tensor of shape torch.Size([64, 1, 7]) in "weight" (which has shape torch.Size([151936, 2048]))
The issue mainly arises due to Accelerate (device_map="auto") is incompatible with Chroma-4B’s multi-component architecture.
When device_map="auto" is used, Accelerate installs alignment hooks that attempt to move and reassign parameters during the forward pass. During audio generation, these hooks incorrectly try to align codec convolution weights (shape [64, 1, 7]) with transformer embedding parameters (shape [151936, 2048]), resulting in a shape mismatch and a runtime ValueError.