Unexpected inference behavior affecting training metrics and instance duplication across retraining #2512

RotemYehuda · 2025-12-23T13:44:04Z

RotemYehuda
Dec 23, 2025

Hey SLEAP team,
I am encountering two recurring issues while training and refining SLEAP models and would appreciate your guidance.

1. Inference error affecting earlier saved versions
I routinely save separate project versions before modifying labels or retraining models.
However, on multiple occasions, an error occurred during inference.

After this happens:

All instances appear correctly labeled.
Training metrics, including loss curves and evaluation metrics, are no longer accessible.
The same issue also occurs when opening earlier project versions saved before the failing inference step.

This suggests that a failure in a later training or inference step may affect access to model metadata in previously saved versions.

2. Repeated duplication of instances in Human-in-the-loop training
When continuing training using the Human-in-the-loop workflow from a previously trained model, I have repeatedly observed duplication of instances.
Instead of refining existing skeleton predictions, the model generates additional instances, resulting in multiple skeletons per frame.

I would like to understand whether this behavior is expected, and if there is a recommended workflow to prevent instance duplication when retraining from an existing model.

OS: Windows 11
Python: 3.13.9
SLEAP: 1.5.2
NumPy: 2.2.6
GPU: NVIDIA GeForce RTX 3050 OEM (8 GB)
PyTorch: 2.8.0+cu128
CUDA available in torch: True

I would greatly appreciate any guidance or clarification regarding these behaviors, including whether they are expected or reflect an issue in my workflow or setup.

Thank you,
Rotem

Answered by talmo

Jan 12, 2026

Hi @RotemYehuda,

Thanks for the follow-up and the detailed information - this was very helpful for tracking down the issue.

Issue 1: Labels bleeding across project versions

We identified and fixed a critical bug in v1.6.0a0 where videos with the same resolution could be incorrectly matched during internal operations (#2535). This, combined with a redesigned video matching algorithm in sleap-io v0.6.0 (#300), should prevent the cross-contamination you experienced between project versions.

The PermissionError itself is likely a Windows file-locking issue (antivirus scanning, Windows Search indexing, etc.), but the downstream corruption - labels appearing in earlier versions - should no long…

View full answer

talmo · 2025-12-23T18:48:23Z

talmo
Dec 23, 2025
Maintainer

Hi @RotemYehuda,

Thanks for the report! We're working on the second issue, but for the first one, could you clarify a couple of things?

1. Inference error affecting earlier saved versions
I routinely save separate project versions before modifying labels or retraining models.
However, on multiple occasions, an error occurred during inference.

An error during inference, not training?

What error do you see? Is there anything in the terminal?

Is this the inference on the suggestion frames (or another "Predict on:" target) that happens immediately after training or are you doing a Training -> Run Inference... separately?

After this happens:

All instances appear correctly labeled.

Training metrics, including loss curves and evaluation metrics, are no longer accessible.

How do you mean they're no longer accessible?

After training, the loss viewer window closes by default, so the curves aren't typically accessible anyway.

Where are you trying to access them -- in the "Evaluation Metrics for Trained Models..." window? Just looking at the model folder?

The same issue also occurs when opening earlier project versions saved before the failing inference step.

This suggests that a failure in a later training or inference step may affect access to model metadata in previously saved versions.

The model metadata is saved in the model folder, not the SLP file, so they shouldn't affect each other at all.

Let us know about how you're trying to access this metadata, I think that'll help to clear things up a bit so we can troubleshoot.

Thanks!

Talmo

1 reply

RotemYehuda Jan 12, 2026
Author

Hi @talmo ,
just a gentle follow up in case this slipped through.
I replied above with more details and the full error context.
Happy to provide any additional info if helpful. Thanks!

RotemYehuda · 2025-12-24T13:03:29Z

RotemYehuda
Dec 24, 2025
Author

Hi @talmo,
Thanks for the detailed questions and for looking into this.
You’re absolutely right, this was my mistake. The issue occurs during training, not during a manually triggered inference.

More specifically, training itself completes successfully, but the error happens during the automatic inference and evaluation step that runs at the end of training, when using Predict → Run Training with Predict on: suggested frames.
At that stage, SLEAP attempts to run inference on the training and validation sets and save the outputs. The terminal shows a PermissionError when trying to save the validation prediction file:

PermissionError: [Errno 13] Permission denied
... pred_val_0.slp

After this happens, the following issues appear:

Evaluation metrics are no longer accessible
When I click Predict → Evaluation Metrics for Trained Models, the GUI raises an error in the terminal indicating that the validation metrics path is None (NoneType error when loading metrics). This happens even though metrics were accessible before this failed training run.
Earlier project versions are affected
When I reopen earlier project versions that were saved before this failed training, the same Evaluation Metrics error appears. Prior to the failure in the later version, metrics were accessible in those earlier versions.
Label versions appear to bleed across projects
After this issue occurred in a later project version, labeled frames from that version appear in earlier versions as well. For example, if version 4 originally had 100 labeled frames and version 6 added 100 more, reopening version 4 now shows 200 labeled frames.

I understand that model metadata is stored in the model folder rather than the SLP file, but empirically it seems that a failure during this training evaluation step can leave the project in a state where both metrics access and label version separation are affected.

I’ve also attached a screenshot of the GUI error dialog that appears at the end of training, for completeness. The detailed error information is in the terminal traceback described above.
Please let me know if you’d like the full traceback or more details about the filesystem setup. I’m happy to provide anything that helps with debugging.

Thanks again for your help,
Rotem

0 replies

talmo · 2026-01-12T08:29:52Z

talmo
Jan 12, 2026
Maintainer

Hi @RotemYehuda,

Thanks for the follow-up and the detailed information - this was very helpful for tracking down the issue.

Issue 1: Labels bleeding across project versions

We identified and fixed a critical bug in v1.6.0a0 where videos with the same resolution could be incorrectly matched during internal operations (#2535). This, combined with a redesigned video matching algorithm in sleap-io v0.6.0 (#300), should prevent the cross-contamination you experienced between project versions.

The PermissionError itself is likely a Windows file-locking issue (antivirus scanning, Windows Search indexing, etc.), but the downstream corruption - labels appearing in earlier versions - should no longer occur even if a permission error happens.

Issue 2: Instance duplication in human-in-the-loop

v1.6.0a0 adds two features to address this:

Prediction handling modes in the training dialog - you can now choose "Replace" to automatically remove existing predictions on frames before running inference, preventing accumulation across training cycles
Delete Predictions on User-Labeled Frames (Predict menu) - a new option to clean up duplicate instances on frames that have both user labels and predictions

Would you be willing to test v1.6.0a0?

This is a pre-release, but it includes these fixes along with many other improvements. See the v1.6.0a0 release notes for full details.

To upgrade (Windows with NVIDIA GPU):

uv tool install --force --python 3.12 "sleap[nn]==1.6.0a0" --with "sleap-io==0.6.0" --with "sleap-nn==0.1.0a0" --prerelease allow --index https://download.pytorch.org/whl/cu128 --index https://pypi.org/simple

If you encounter any issues or the problems persist, please let us know!

Cheers,

Talmo & Claude

Extended technical analysis

Issue 1: Root cause

The "labels bleeding" symptom pointed to incorrect video matching. When multiple .slp files reference the same video, and SLEAP matches videos by content (resolution) rather than identity, operations on one project could inadvertently affect another.

Key fixes:

SLEAP Fix remove_video() to use identity comparison instead of matches_content() #2535: remove_video() now uses identity comparison instead of matches_content(). The old code matched videos by their properties (resolution, frame count) rather than actual identity, meaning operations intended for one video could accidentally affect frames associated with a different video with the same dimensions.
sleap-io UI for catching user mistakes in labeling #300: Complete redesign of video matching with a safe cascade: shape rejection → provenance conflict check → physical file identity (os.path.samefile()) → path matching → fallback to adding as new video. Design principle: "False positives (matching wrong videos) corrupt data irreversibly; false negatives (adding as new) are easily recoverable."
sleap-io Develop #302: Fixed Video.__deepcopy__() to preserve the original_video provenance chain, which is critical for the matching algorithm above.

Issue 2: Root cause

This is expected behavior without explicit handling - when inference runs, it adds new predicted instances without checking if predictions already exist. Each training cycle accumulates more predictions.

Key fixes:

SLEAP Refactor training/inference dialog with native Qt and unified frame targeting #2519: New prediction handling modes (Keep/Replace/Clear) in the training dialog. "Replace" removes predictions on frames being predicted before adding new ones.
SLEAP Add Delete Predictions on User-Labeled Frames feature #2505: New "Delete Predictions on User-Labeled Frames" option for cleaning up accumulated duplicates.
sleap-io Sorting suggestions by score fails with frames without predictions #278: New replace_predictions merge strategy for programmatic workflows.

1 reply

RotemYehuda Jan 22, 2026
Author

Hi @talmo,

Thank you very much for the detailed explanation and for fixing these issues.
I’ve updated to v1.6.0a0 and will follow up after testing it in practice.

I also wanted to say that SLEAP is a very pleasant and intuitive tool to work with, it’s easy to use and extremely helpful for our work in the lab.
Thanks again for the great support.

Best,
Rotem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected inference behavior affecting training metrics and instance duplication across retraining #2512

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Issue 1: Root cause

Issue 2: Root cause

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Unexpected inference behavior affecting training metrics and instance duplication across retraining #2512

Uh oh!

RotemYehuda Dec 23, 2025

Issue 1: Labels bleeding across project versions

Replies: 3 comments · 2 replies

Uh oh!

talmo Dec 23, 2025 Maintainer

Uh oh!

RotemYehuda Jan 12, 2026 Author

Uh oh!

Uh oh!

RotemYehuda Dec 24, 2025 Author

Uh oh!

talmo Jan 12, 2026 Maintainer

Issue 1: Labels bleeding across project versions

Issue 2: Instance duplication in human-in-the-loop

Would you be willing to test v1.6.0a0?

Issue 1: Root cause

Issue 2: Root cause

Uh oh!

RotemYehuda Jan 22, 2026 Author

RotemYehuda
Dec 23, 2025

Replies: 3 comments 2 replies

talmo
Dec 23, 2025
Maintainer

RotemYehuda Jan 12, 2026
Author

RotemYehuda
Dec 24, 2025
Author

talmo
Jan 12, 2026
Maintainer

RotemYehuda Jan 22, 2026
Author