Skip to content

Feature Request: Filter for Regions with Minimal NaNs #835

@AnandMayank

Description

@AnandMayank

Is your feature request related to a problem? Please describe.

Trajectory analyses are often affected by missing data (NaNs), which can bias metrics or invalidate results. Manually filtering for regions with fewer NaNs is tedious and subjective. An automated filter would improve reproducibility and streamline analysis.

Describe the solution you'd like

A filter function, e.g., filter_valid_regions, that automatically selects and returns the longest (or best) contiguous region(s) of a trajectory (or batch of trajectories) with minimal or no NaNs, based on a user-defined threshold for missing data. This filter should work at the dataset level and return a filtered DataArray or Dataset ready for downstream analysis.

Example usage:

import xarray as xr
from movement.filters import filter_valid_regions

# Assume 'position' is an xarray.DataArray with shape (time, space, keypoint, individual)
# Filter to region(s) with <20% NaNs
```python
filtered = filter_valid_regions(
    position,
    max_nan_fraction=0.2,
    min_length=100,
    dim="time",
)

# Now you can compute metrics like straightness_index on the filtered data
from movement.kinematics import straightness_index
si = straightness_index(position_filtered)

Describe alternatives you've considered

Manual masking or slicing, which is error-prone and not scalable for large datasets.

Additional context

This filter would complement the new straightness_index and other trajectory metrics, enabling robust, automated motif discovery and statistical analysis even in the presence of missing data.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🤔 Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions