Commit 3f3287b
authored
Sample and dtype validation (#1966)
This pull request introduces improvements to type safety, validation,
and schema handling for experimental dataset fields, with updates across
core modules and tests. The main changes include stricter type
validation for sample attributes, standardized handling of `dtype` for
fields, and updates to tests to reflect these improvements.
**Validation and Type Safety Enhancements**
* Added a `validate()` method to the `Sample` class to check attribute
presence and type correctness against the inferred schema, including
support for `Union` and `Callable` types. Validation is now called after
initialization.
[[1]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fL38-R41)
[[2]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR51-R90)
* Improved attribute type validation logic to correctly handle complex
types such as `Union` and `Callable`.
**Field `dtype` Standardization**
* Updated the `Field` base class to ensure that `dtype` is always a
Polars `DataType` instance, converting from a class if necessary and
raising an error for invalid types.
* All field subclasses now explicitly declare their `dtype` using
`field(default_factory=...)` or `field(default=..., init=False)`,
ensuring consistent and correct schema generation.
[[1]](diffhunk://#diff-3098d9e238cfbc0a11faa65960430fbaadedc13fc8181e6042c44550eceff117R47-R58)
[[2]](diffhunk://#diff-3098d9e238cfbc0a11faa65960430fbaadedc13fc8181e6042c44550eceff117R138)
[[3]](diffhunk://#diff-25a01cfa1cd0bc6a78e628327b4eb8a1953cfda02cae490190eb57037b68960eR132)
[[4]](diffhunk://#diff-25a01cfa1cd0bc6a78e628327b4eb8a1953cfda02cae490190eb57037b68960eR178-R180)
[[5]](diffhunk://#diff-25a01cfa1cd0bc6a78e628327b4eb8a1953cfda02cae490190eb57037b68960eR235)
[[6]](diffhunk://#diff-25a01cfa1cd0bc6a78e628327b4eb8a1953cfda02cae490190eb57037b68960eR280)
[[7]](diffhunk://#diff-2a524f5523ceb4ac2f8edce7fcbff9268da416a550e2166ef455df966a2eb9b0L171-R171)
[[8]](diffhunk://#diff-2a524f5523ceb4ac2f8edce7fcbff9268da416a550e2166ef455df966a2eb9b0L243-R243)
**Schema and Serialization Improvements**
* The `from_dict` method for fields now skips non-init dataclass fields
during deserialization, preventing errors from fields that shouldn't be
set via the constructor.
**Test Suite Updates**
* Updated integration and unit tests to use explicit types for
callable/image/mask fields, replacing `Any` with more precise unions
(e.g., `np.ndarray | Callable[[], np.ndarray]`). This strengthens type
checking and reflects the stricter validation logic.
[[1]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L73-R73)
[[2]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L164-R164)
[[3]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L206-R208)
[[4]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L247-R247)
[[5]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L304-R304)
[[6]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L361-R361)
[[7]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L408-R408)
[[8]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L476-R476)
[[9]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L554-R554)
[[10]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L664-R664)
[[11]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L695-R695)
[[12]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L828-R833)
* Added a new unit test to verify correct validation and conversion of
field `dtype`, including error handling for invalid types.
**General Codebase Maintenance**
* Added missing imports and minor refactoring for clarity and
correctness in test and core files.
[[1]](diffhunk://#diff-4ac196ddc4dc8e6d33daf684ded18886ff8774fadb8b6cbd4bfa88ca424bb34fR7-R9)
[[2]](diffhunk://#diff-3098d9e238cfbc0a11faa65960430fbaadedc13fc8181e6042c44550eceff117L5-R11)
[[3]](diffhunk://#diff-f7d8d115b4530c510a92b0dbc1a1174ca351f0c2767fd8137a2da599fff8b484R7)
[[4]](diffhunk://#diff-f7d8d115b4530c510a92b0dbc1a1174ca351f0c2767fd8137a2da599fff8b484R54)
[[5]](diffhunk://#diff-e1e501a8f398cfac69ca16099abb40dd54bd830d7a4eeb3c2d765af75c5f6e00R1)
[[6]](diffhunk://#diff-ef7ef70a007f9bb0c6cf01021e927c6bbd39fbe5164b830620b9abb4d11ee354L16-R16)
These changes collectively improve the robustness and maintainability of
the experimental dataset system, especially around schema definition,
attribute validation, and field type safety.
<!-- Contributing guide:
https://github.com/open-edge-platform/datumaro/blob/develop/contributing.md
-->
<!--
Please add a summary of changes. You may use Copilot to auto-generate
the PR description but please consider including any other relevant
facts which Copilot may be unaware of (such as design choices and
testing procedure).
Add references to the relevant issues and pull requests if any like so:
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).
-->
Resolves #1855
### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added tests to cover my changes or documented any manual
tests.
- [ ] I have updated the
[documentation](https://github.com/open-edge-platform/datumaro/tree/develop/docs)
accordingly
---------
Signed-off-by: Jort Bergfeld <[email protected]>1 parent 366941e commit 3f3287b
File tree
27 files changed
+483
-323
lines changed- src/datumaro/experimental
- converters
- fields
- legacy
- tiling
- tests
- integration/experimental
- unit/experimental
- fields
27 files changed
+483
-323
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
109 | | - | |
| 109 | + | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| |||
161 | 161 | | |
162 | 162 | | |
163 | 163 | | |
164 | | - | |
| 164 | + | |
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| |||
275 | 275 | | |
276 | 276 | | |
277 | 277 | | |
278 | | - | |
| 278 | + | |
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
| |||
342 | 342 | | |
343 | 343 | | |
344 | 344 | | |
345 | | - | |
| 345 | + | |
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
| 156 | + | |
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
| 168 | + | |
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
285 | | - | |
| 285 | + | |
286 | 286 | | |
287 | 287 | | |
288 | 288 | | |
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
297 | | - | |
| 297 | + | |
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
| |||
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
38 | | - | |
39 | 40 | | |
| 41 | + | |
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
| |||
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
49 | 91 | | |
50 | 92 | | |
51 | 93 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
393 | 393 | | |
394 | 394 | | |
395 | 395 | | |
396 | | - | |
| 396 | + | |
397 | 397 | | |
398 | 398 | | |
399 | | - | |
| 399 | + | |
400 | 400 | | |
401 | 401 | | |
402 | 402 | | |
| |||
430 | 430 | | |
431 | 431 | | |
432 | 432 | | |
433 | | - | |
| 433 | + | |
434 | 434 | | |
435 | 435 | | |
436 | 436 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | | - | |
| 162 | + | |
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
| 220 | + | |
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
| |||
276 | 276 | | |
277 | 277 | | |
278 | 278 | | |
279 | | - | |
| 279 | + | |
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
| 338 | + | |
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
| |||
395 | 395 | | |
396 | 396 | | |
397 | 397 | | |
398 | | - | |
| 398 | + | |
399 | 399 | | |
400 | 400 | | |
401 | 401 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | 22 | | |
25 | 23 | | |
26 | 24 | | |
| |||
48 | 46 | | |
49 | 47 | | |
50 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
51 | 60 | | |
52 | 61 | | |
53 | 62 | | |
| |||
165 | 174 | | |
166 | 175 | | |
167 | 176 | | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
168 | 180 | | |
169 | 181 | | |
170 | 182 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
47 | 59 | | |
48 | 60 | | |
49 | 61 | | |
| |||
123 | 135 | | |
124 | 136 | | |
125 | 137 | | |
| 138 | + | |
126 | 139 | | |
127 | 140 | | |
128 | 141 | | |
| |||
141 | 154 | | |
142 | 155 | | |
143 | 156 | | |
144 | | - | |
| 157 | + | |
145 | 158 | | |
146 | 159 | | |
147 | 160 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
174 | 175 | | |
175 | 176 | | |
176 | 177 | | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
177 | 181 | | |
178 | 182 | | |
179 | 183 | | |
| |||
228 | 232 | | |
229 | 233 | | |
230 | 234 | | |
| 235 | + | |
231 | 236 | | |
232 | 237 | | |
233 | 238 | | |
| |||
272 | 277 | | |
273 | 278 | | |
274 | 279 | | |
| 280 | + | |
275 | 281 | | |
276 | 282 | | |
277 | 283 | | |
278 | | - | |
| 284 | + | |
279 | 285 | | |
280 | 286 | | |
281 | 287 | | |
| |||
0 commit comments