Skip to content

Conversation

@AlbertvanHouten
Copy link
Contributor

@AlbertvanHouten AlbertvanHouten commented Oct 20, 2025

This pull request introduces the new experimental dataset into the dataset classes of OTX. A lot of logic has been migrated to datumaro such as management of image color channels and the tiling implementation.

Summary

How to test

Checklist

  • The PR title and description are clear and descriptive
  • I have manually tested the changes
  • All changes are covered by automated tests
  • All related issues are linked to this PR (if applicable)
  • Documentation has been updated (if applicable)

AlbertvanHouten and others added 16 commits September 4, 2025 10:10
Signed-off-by: Albert van Houten <[email protected]>
Co-authored-by: Leonardo Lai <[email protected]>
…4751)

Signed-off-by: Albert van Houten <[email protected]>
Co-authored-by: Grégoire Payen de La Garanderie <[email protected]>
…ing_extensions into feature/datumaro

# Conflicts:
#	library/src/otx/data/dataset/base_new.py
#	library/src/otx/data/dataset/classification_new.py
#	library/src/otx/data/dataset/detection_new.py
#	library/src/otx/data/dataset/instance_segmentation_new.py
#	library/src/otx/data/dataset/keypoint_detection_new.py
#	library/src/otx/data/dataset/segmentation_new.py
#	library/src/otx/data/entity/sample.py
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Co-authored-by: Albert van Houten <[email protected]>
@github-actions github-actions bot added DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM TEST Any changes in tests BUILD labels Oct 20, 2025
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
…ing_extensions into feature/datumaro

Signed-off-by: Albert van Houten <[email protected]>

# Conflicts:
#	library/pyproject.toml
#	library/src/otx/data/dataset/anomaly.py
#	library/src/otx/data/dataset/base.py
#	library/src/otx/data/dataset/classification.py
#	library/src/otx/data/dataset/detection.py
#	library/src/otx/data/dataset/instance_segmentation.py
#	library/src/otx/data/dataset/keypoint_detection.py
#	library/src/otx/data/dataset/segmentation.py
#	library/src/otx/data/dataset/tile.py
#	library/src/otx/data/factory.py
#	library/src/otx/data/module.py
#	library/src/otx/data/transform_libs/torchvision.py
#	library/tests/unit/data/samplers/test_class_incremental_sampler.py
#	library/tests/unit/data/utils/test_utils.py
@AlbertvanHouten AlbertvanHouten changed the title Feature/datumaro Experimental datumaro implementation Oct 29, 2025
@AlbertvanHouten AlbertvanHouten marked this pull request as ready for review October 29, 2025 09:24
@AlbertvanHouten AlbertvanHouten requested a review from a team as a code owner October 29, 2025 09:24
Copilot AI review requested due to automatic review settings October 29, 2025 09:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements an experimental Datumaro dataset integration for OTX, transitioning from legacy Datumaro components to the new experimental Dataset API. The changes introduce a new sample-based architecture while maintaining compatibility with existing OTX functionality.

Key changes:

  • Migration from legacy Datumaro components to experimental Dataset API with schema-based conversion
  • Introduction of new OTXSample-based data entities with PyTree registration for TorchVision compatibility
  • Replacement of legacy polygon handling with numpy ragged arrays for better performance
  • Comprehensive test updates and new test implementations for the updated dataset architecture

Reviewed Changes

Copilot reviewed 59 out of 59 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pyproject.toml Updates Datumaro dependency to experimental branch version
library/src/otx/data/dataset/*.py Implements new dataset classes using experimental Datumaro with sample-based architecture
library/src/otx/data/entity/sample.py Adds new OTXSample classes with PyTree registration for transform compatibility
library/tests/unit/types/test_label.py Updates label tests to use new hierarchical label categories
library/tests/unit/data/transform_libs/test_torchvision.py Converts polygon handling from Datumaro objects to numpy arrays

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +143 to +148
fake_polygons = np.array(
[
np.array([[10, 10], [50, 10], [50, 50], [10, 50]]), # Rectangle polygon for first object
np.array([[60, 60], [100, 60], [100, 100], [60, 100]]), # Rectangle polygon for second object
]
)
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polygon data structure has changed from Datumaro Polygon objects to numpy ragged arrays, but the test doesn't validate that the new format maintains the same geometric properties. Consider adding assertions to verify that the polygon areas and vertex coordinates are equivalent between the old and new formats.

Copilot uses AI. Check for mistakes.
p = np.asarray(polygon.points)
p[0::2] = width - p[0::2]
return p.tolist()
def revert_hflip(polygon: np.ndarray, width: int) -> np.ndarray:
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function modifies the input polygon array in-place, which could cause issues if the original array is used elsewhere. The function should create a copy before modification to avoid unintended side effects.

Suggested change
def revert_hflip(polygon: np.ndarray, width: int) -> np.ndarray:
def revert_hflip(polygon: np.ndarray, width: int) -> np.ndarray:
polygon = polygon.copy()

Copilot uses AI. Check for mistakes.
# Conflicts:
#	library/tests/unit/data/dataset/test_base.py
#	library/tests/unit/data/test_tiling.py
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 81 out of 82 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pl_module (LightningModule): pl module.
batch_size (int): batch size.
"""
device = trainer.strategy.root_device
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of 'mps' device type is good, but consider documenting why MPS (Apple Metal Performance Shaders) devices should skip GPU memory monitoring, similar to CPU and XPU.

Suggested change
device = trainer.strategy.root_device
device = trainer.strategy.root_device
# Skip GPU memory monitoring for CPU, XPU, and MPS (Apple Metal Performance Shaders) devices.
# MPS devices do not support the same GPU memory monitoring APIs as CUDA devices,
# so memory stats are unavailable or unreliable, similar to CPU and XPU.

Copilot uses AI. Check for mistakes.
Signed-off-by: Albert van Houten <[email protected]>
@AlbertvanHouten AlbertvanHouten requested a review from a team as a code owner December 4, 2025 09:02
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
uses: dtolnay/rust-toolchain@b9ed5a8fb8afb645c9cfaffecc60fc946d28f857 # stable
# TODO remove after switching to an official Datumaro release with pre-built wheels
- name: Installing Rust toolchain
uses: dtolnay/rust-toolchain@6d653acede28d24f02e3cd41383119e8b1b35921
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this sha? What version is it? Pls add info in comment. I checked latest versions and can't see this commit:
commit 0d231036f5e097127fd969f80fb8e44d7394f7f9 (HEAD -> 1.91.1, origin/1.91.1, origin/1.91)
commit a4a18948872904a1ea9bf180c73ce0a3ce1ceaab (HEAD -> 1.91.0, origin/1.91.0)
The commit you provide looks like commit from master branch:
commit 6d653acede28d24f02e3cd41383119e8b1b35921
Shouldn't we use some of release versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably taken from #5054, which I copied from the Datumaro repo [ref].

The maintainer of this action dtolnay/rust-toolchain keeps updating his project but doesn't make official GH releases (actually there's one release - v1 - but it's from 2022).

Signed-off-by: Albert van Houten <[email protected]>
- Remove data module/loader dependency from simple model tests. Added a fixture to generate a sample batch
- Update rtdetr model to stack OTXDataBatch images if they are provided as a list.

Signed-off-by: Albert van Houten <[email protected]>
Move memory cleanup to model tests only

Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BUILD DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM Geti Tune Backend Issues related to Geti Tune Studio backend TEST Any changes in tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate OTX with the new dataset class

7 participants