Commit d6e9b9b
authored
Lazy-loading during attribute access (#1882)
This PR changes the lazy-loading behaviour such that lazy-loading
happens when a sample attribute is accessed rather than when fetching
the sample. This allows for a more fine-grained approach where
attributes such as images are fetched only if they are actually used.
For instance, if a piece of code only need to access the sample labels,
the sample image will not be loaded, thus improving the performance.
E.g.:
```
sample = dataset[0] # Old behaviour would load image here.
sample.label # At this point, the old behaviour would already have loaded sample.image even if it’s not used. The new behaviour sets the label without having to load the image, thus improving performance.
sample.image # The new behaviour loads the image here.
```
To do so, we track which converters are needed for which attribute, then
when an attribute is accessed, we run the required converters. We
distinguish between direct attributes (like labels) which we don’t
require any converters and lazy attributes. Direct attributes are
available as plain object attribute, no magic. Lazy attributes behaves
like a class `@property` calling the a function to evaluate the
attribute value. Once a lazy attribute has been evaluated, the computed
value is stored in the sample class and it behaves like a direct
attribute.
The existing implementation of attribute renaming was problematic
because the renaming converter was inserted as the last converter and
all attributes had a dependency on it which prevents per-attribute lazy
loading. To fix this problem, this PR reworks attribute renaming to
insert the converter as the first converter. Since this converter is the
first, it can also be executed as a batch converter.
**Refactoring of converter pathfinding and lazy converter handling:**
* The logic for attribute renaming and deletion during schema conversion
has been reworked. Now, an initial attribute remapping step is
introduced at the start of the conversion process, and post-processing
steps for renaming/deletion have been removed for clarity and
correctness. (_create_initial_renaming_converter replaces
_create_post_processing_for_semantic, and related changes in
`_find_conversion_path_for_semantic`)
[[1]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L638-R677)
[[2]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L714-R715)
[[3]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L741-R731)
[[4]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L769-R755)
* The handling of lazy converters has been improved: instead of a flat
list, lazy converters are now tracked as a dictionary mapping output
attribute names to lists of converters, allowing for more precise and
efficient application of lazy conversions. This affects both the return
type of `ConversionPaths` and the internal logic for separating batch
and lazy converters.
[[1]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L54-R54)
[[2]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41R824)
[[3]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L856-R833)
[[4]](diffhunk://#diff-fb2a908f10fa67b50f19323df8324a854db0618c5c61595394aa828af411ca41L887-R900)
**Dataset and Sample class improvements:**
* The `Sample` and `Dataset` classes have been updated to support the
new lazy converter structure, storing and exposing lazy converters as
dictionaries keyed by attribute name. The initialization and
from_dataframe logic have been updated accordingly.
[[1]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR48-R55)
[[2]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL153-R172)
[[3]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL169-R190)
[[4]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL186-R200)
* The `Sample` class now tracks applied converters and stores schema and
dataframe references for dynamic property loading, enabling more
flexible and efficient lazy evaluation.
**Other improvements and minor changes:**
* The `AttributeRemapperConverter` now uses `rename()` to rename columns
and keeps all other columns, instead of selecting only mapped columns.
* The image converter now creates output columns with explicit dtype
using schema information, improving compatibility and correctness.
* Minor code cleanup and improvements, such as more precise `__repr__`
output for `Sample` and updated imports.
[[1]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL56-R68)
[[2]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR9)
[[3]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL17-R19)
<!-- Contributing guide:
https://github.com/open-edge-platform/datumaro/blob/develop/CONTRIBUTING.md
-->
<!--
Please add a summary of changes. You may use Copilot to auto-generate
the PR description but please consider including any other relevant
facts which Copilot may be unaware of (such as design choices and
testing procedure).
Add references to the relevant issues and pull requests if any like so:
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).
-->
### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [ ] I have added tests to cover my changes or documented any manual
tests.
- [ ] I have added the description of my changes into
[CHANGELOG](https://github.com/open-edge-platform/datumaro/blob/develop/CHANGELOG.md).
- [ ] I have updated the
[documentation](https://github.com/open-edge-platform/datumaro/tree/develop/docs)
accordingly
---------
Signed-off-by: Grégoire Payen de La Garanderie <[email protected]>1 parent 0df650a commit d6e9b9b
File tree
6 files changed
+234
-197
lines changed- src/datumaro/experimental
- tests/unit/experimental
6 files changed
+234
-197
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
408 | 408 | | |
409 | 409 | | |
410 | 410 | | |
411 | | - | |
412 | | - | |
| 411 | + | |
413 | 412 | | |
414 | 413 | | |
415 | 414 | | |
416 | 415 | | |
417 | 416 | | |
418 | | - | |
| 417 | + | |
419 | 418 | | |
420 | | - | |
421 | | - | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
| 419 | + | |
| 420 | + | |
427 | 421 | | |
428 | 422 | | |
429 | 423 | | |
| |||
592 | 586 | | |
593 | 587 | | |
594 | 588 | | |
595 | | - | |
596 | | - | |
597 | | - | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
598 | 592 | | |
599 | 593 | | |
600 | 594 | | |
| |||
655 | 649 | | |
656 | 650 | | |
657 | 651 | | |
658 | | - | |
659 | | - | |
660 | | - | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
661 | 655 | | |
662 | | - | |
| 656 | + | |
663 | 657 | | |
664 | 658 | | |
665 | | - | |
| 659 | + | |
666 | 660 | | |
667 | 661 | | |
668 | 662 | | |
669 | | - | |
670 | | - | |
| 663 | + | |
671 | 664 | | |
672 | | - | |
673 | | - | |
674 | 665 | | |
675 | | - | |
| 666 | + | |
676 | 667 | | |
| 668 | + | |
677 | 669 | | |
678 | | - | |
679 | | - | |
680 | | - | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
681 | 674 | | |
682 | | - | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
683 | 679 | | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
684 | 691 | | |
685 | | - | |
686 | | - | |
687 | | - | |
688 | | - | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | | - | |
694 | | - | |
695 | 692 | | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
701 | | - | |
702 | | - | |
703 | | - | |
704 | | - | |
705 | | - | |
706 | | - | |
707 | | - | |
708 | | - | |
709 | | - | |
710 | | - | |
711 | | - | |
712 | | - | |
713 | | - | |
714 | | - | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
715 | 698 | | |
716 | 699 | | |
717 | 700 | | |
| |||
731 | 714 | | |
732 | 715 | | |
733 | 716 | | |
734 | | - | |
735 | | - | |
736 | | - | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
737 | 726 | | |
738 | | - | |
| 727 | + | |
739 | 728 | | |
740 | 729 | | |
741 | 730 | | |
742 | 731 | | |
743 | | - | |
744 | | - | |
745 | | - | |
746 | | - | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
747 | 736 | | |
748 | 737 | | |
749 | 738 | | |
| |||
758 | 747 | | |
759 | 748 | | |
760 | 749 | | |
761 | | - | |
762 | | - | |
763 | | - | |
764 | | - | |
765 | | - | |
| 750 | + | |
| 751 | + | |
766 | 752 | | |
767 | 753 | | |
768 | 754 | | |
| |||
786 | 772 | | |
787 | 773 | | |
788 | 774 | | |
789 | | - | |
| 775 | + | |
790 | 776 | | |
791 | 777 | | |
792 | 778 | | |
| |||
820 | 806 | | |
821 | 807 | | |
822 | 808 | | |
823 | | - | |
824 | | - | |
825 | 809 | | |
826 | 810 | | |
827 | 811 | | |
| |||
835 | 819 | | |
836 | 820 | | |
837 | 821 | | |
838 | | - | |
839 | | - | |
840 | | - | |
841 | | - | |
842 | | - | |
843 | 822 | | |
844 | 823 | | |
845 | | - | |
846 | | - | |
847 | | - | |
848 | 824 | | |
849 | 825 | | |
850 | 826 | | |
| |||
869 | 845 | | |
870 | 846 | | |
871 | 847 | | |
| 848 | + | |
872 | 849 | | |
873 | 850 | | |
874 | 851 | | |
| |||
877 | 854 | | |
878 | 855 | | |
879 | 856 | | |
880 | | - | |
| 857 | + | |
881 | 858 | | |
882 | 859 | | |
883 | 860 | | |
| |||
908 | 885 | | |
909 | 886 | | |
910 | 887 | | |
911 | | - | |
| 888 | + | |
912 | 889 | | |
913 | | - | |
914 | | - | |
915 | 890 | | |
916 | | - | |
917 | | - | |
918 | | - | |
| 891 | + | |
919 | 892 | | |
920 | 893 | | |
921 | | - | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
321 | 321 | | |
322 | 322 | | |
323 | 323 | | |
324 | | - | |
| 324 | + | |
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
331 | 334 | | |
| 335 | + | |
332 | 336 | | |
333 | 337 | | |
334 | | - | |
335 | | - | |
336 | | - | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
337 | 341 | | |
338 | 342 | | |
339 | 343 | | |
| |||
0 commit comments