-
Notifications
You must be signed in to change notification settings - Fork 126
Description
Describe the bug
_extract_confidence_from_via_tracks_df in movement/io/load_bboxes.py uses an
all(...) check to decide whether to extract confidence scores from the
region_attributes column. If even a single row is missing the confidence key,
the function discards all confidence values across the entire file and returns an
array of NaNs — including for rows that do have valid confidence scores.
This causes silent data loss when working with VIA tracks files where confidence
annotations are only partially provided, which is a common real-world scenario.
To Reproduce
import numpy as np
import pandas as pd
from movement.io import load_bboxes
# Simulate region_attributes where only some rows have confidence
region_attributes = [
{"confidence": 0.5},
{}, # missing confidence
{"confidence": 0.2},
]
# The function currently returns [NaN, NaN, NaN]
# instead of the expected [0.5, NaN, 0.2]The root cause is in _extract_confidence_from_via_tracks_df:
# Current (buggy) behaviour
if all(["confidence" in d for d in region_attributes_dicts]):
bbox_confidence = _via_attribute_column_to_numpy(...)
else:
# This branch throws away ALL confidence values
bbox_confidence = np.full((df.shape[0], 1), np.nan).squeeze()Expected behaviour
Rows that contain a confidence key should retain their value.
Only rows where the key is absent should be set to NaN.
The expected output for the example above is [0.5, NaN, 0.2].
Additional context
- Affected file:
movement/io/load_bboxes.py - Affected function:
_extract_confidence_from_via_tracks_df - No existing test covers the partial-confidence case;
the current tests only cover all-present and all-absent scenarios.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status