Unexpected inference behavior affecting training metrics and instance duplication across retraining #2512
-
|
Hey SLEAP team, 1. Inference error affecting earlier saved versions After this happens:
This suggests that a failure in a later training or inference step may affect access to model metadata in previously saved versions. 2. Repeated duplication of instances in Human-in-the-loop training I would like to understand whether this behavior is expected, and if there is a recommended workflow to prevent instance duplication when retraining from an existing model. I would greatly appreciate any guidance or clarification regarding these behaviors, including whether they are expected or reflect an issue in my workflow or setup. Thank you, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
|
Hi @RotemYehuda, Thanks for the report! We're working on the second issue, but for the first one, could you clarify a couple of things?
An error during inference, not training? What error do you see? Is there anything in the terminal? Is this the inference on the suggestion frames (or another "Predict on:" target) that happens immediately after training or are you doing a Training -> Run Inference... separately?
How do you mean they're no longer accessible? After training, the loss viewer window closes by default, so the curves aren't typically accessible anyway. Where are you trying to access them -- in the "Evaluation Metrics for Trained Models..." window? Just looking at the model folder?
The model metadata is saved in the model folder, not the SLP file, so they shouldn't affect each other at all. Let us know about how you're trying to access this metadata, I think that'll help to clear things up a bit so we can troubleshoot. Thanks! Talmo |
Beta Was this translation helpful? Give feedback.
-
|
Hi @talmo, More specifically, training itself completes successfully, but the error happens during the automatic inference and evaluation step that runs at the end of training, when using Predict → Run Training with Predict on: suggested frames. After this happens, the following issues appear:
I understand that model metadata is stored in the model folder rather than the SLP file, but empirically it seems that a failure during this training evaluation step can leave the project in a state where both metrics access and label version separation are affected. I’ve also attached a screenshot of the GUI error dialog that appears at the end of training, for completeness. The detailed error information is in the terminal traceback described above. Thanks again for your help,
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @RotemYehuda, Thanks for the follow-up and the detailed information - this was very helpful for tracking down the issue. Issue 1: Labels bleeding across project versionsWe identified and fixed a critical bug in v1.6.0a0 where videos with the same resolution could be incorrectly matched during internal operations (#2535). This, combined with a redesigned video matching algorithm in sleap-io v0.6.0 (#300), should prevent the cross-contamination you experienced between project versions. The Issue 2: Instance duplication in human-in-the-loopv1.6.0a0 adds two features to address this:
Would you be willing to test v1.6.0a0?This is a pre-release, but it includes these fixes along with many other improvements. See the v1.6.0a0 release notes for full details. To upgrade (Windows with NVIDIA GPU): uv tool install --force --python 3.12 "sleap[nn]==1.6.0a0" --with "sleap-io==0.6.0" --with "sleap-nn==0.1.0a0" --prerelease allow --index https://download.pytorch.org/whl/cu128 --index https://pypi.org/simpleIf you encounter any issues or the problems persist, please let us know! Cheers, Talmo & Claude Extended technical analysisIssue 1: Root causeThe "labels bleeding" symptom pointed to incorrect video matching. When multiple Key fixes:
Issue 2: Root causeThis is expected behavior without explicit handling - when inference runs, it adds new predicted instances without checking if predictions already exist. Each training cycle accumulates more predictions. Key fixes:
|
Beta Was this translation helpful? Give feedback.

Hi @RotemYehuda,
Thanks for the follow-up and the detailed information - this was very helpful for tracking down the issue.
Issue 1: Labels bleeding across project versions
We identified and fixed a critical bug in v1.6.0a0 where videos with the same resolution could be incorrectly matched during internal operations (#2535). This, combined with a redesigned video matching algorithm in sleap-io v0.6.0 (#300), should prevent the cross-contamination you experienced between project versions.
The
PermissionErroritself is likely a Windows file-locking issue (antivirus scanning, Windows Search indexing, etc.), but the downstream corruption - labels appearing in earlier versions - should no long…