-
Notifications
You must be signed in to change notification settings - Fork 124
Description
Bug description
Training fails with RuntimeError: The size of tensor a (5) must match the size of tensor b (10) on a project migrated from SLEAP 1.4.x to 1.5. The model is correctly configured for 5 channels (matching the 5-node skeleton), but the training data loader reports 10 channels, causing a tensor broadcast error.
Expected behaviour
Training should succeed with a 5-channel model matching the 5-node skeleton in the project.
Actual behaviour
Training fails immediately with:
RuntimeError: The size of tensor a (5) must match the size of tensor b (10) at non-singleton dimension 1
The error occurs in sleap_nn/training/lightning_modules.py, line 537 during loss computation. The model head correctly outputs 5 channels, but the target tensor from the data pipeline has 10 channels.
Your personal set up
- OS: macOS (Apple Silicon - M-series chip)
- Version(s): SLEAP 1.5.x, Python 3.13
- SLEAP installation method:
uv tool install "sleap[nn]"
# paste relevant logs here, if any
/Users/HL801/.local/share/uv/tools/sleap/lib/python3.13/site-packages/torch/nn/modules/loss.py:616: UserWarning: Using a target size (torch.Size([4, 10, 240, 320])) that is different to the input size (torch.Size([4, 5, 240, 320])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
Traceback (most recent call last):
File "/Users/HL801/.local/share/uv/tools/sleap/lib/python3.13/site-packages/sleap_nn/training/lightning_modules.py", line 537, in training_step
loss = nn.MSELoss()(y_preds, y)
File "/Users/HL801/.local/share/uv/tools/sleap/lib/python3.13/site-packages/torch/nn/modules/loss.py", line 616, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "/Users/HL801/.local/share/uv/tools/sleap/lib/python3.13/site-packages/torch/nn/functional.py", line 3868, in mse_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/Users/HL801/.local/share/uv/tools/sleap/lib/python3.13/site-packages/torch/functional.py", line 77, in broadcast_tensors
return _VF.broadcast_tensors(tensors)
RuntimeError: The size of tensor a (5) must match the size of tensor b (10) at non-singleton dimension 1
**Training output shows:**
- Model head: `Conv2d(32, 5, kernel_size=(1, 1))`
- Target tensor: `torch.Size([4, 10, 240, 320])`
## Screenshots
N/A
## How to reproduce
1. Open a SLEAP project that was created in v1.4.x (now opened in v1.5)
2. Project has a single skeleton with 5 nodes
3. Attempt to train a single instance model using the SLEAP GUI (Predict → Run Training → Single Instance)
4. Training fails immediately with tensor size mismatch error
**Additional diagnostic steps I've run through:**
- Verified only 1 skeleton with 5 nodes exists using `sleap_io.load_slp()`
- Exported clean package with only hand-labeled frames (Predict → Export Labels Package → "Labeled frames")
- Created new project from clean package and attempted training
- Error persists even with fresh export