Commit 04948e7
Multi-label and hierarchical-label classification support (#1925)
This pull request introduces comprehensive support for hierarchical
label categories in the experimental categories module. It adds new data
structures to represent hierarchical relationships between labels,
updates legacy conversion logic to handle hierarchical and multi-label
classification, and improves validation and compatibility. The changes
also make several dataclasses mutable and update type imports for
broader compatibility.
**Hierarchical label support and data structures:**
* Introduced new classes: `HierarchicalLabelCategory`, `LabelGroup`, and
`HierarchicalLabelCategories` in `categories.py` to represent
hierarchical label structures, label groups, and provide methods for
hierarchy traversal and validation. These classes support parent-child
relationships, groupings, and compatibility with existing interfaces.
* Added extensive validation in
`HierarchicalLabelCategories.__post_init__` to ensure label uniqueness,
group-label consistency, and valid parent references.
* Provided utility methods for hierarchy navigation (e.g., `find`,
`get_children`, `get_parent`, `get_hierarchy_level`) and compatibility
with legacy APIs.
**Legacy dataset conversion and analysis improvements:**
* Enhanced `analyze_legacy_dataset` and `convert_from_legacy` in
`legacy.py` to detect hierarchical and multi-label projects, convert
legacy label/group structures to the new classes, and ensure
hierarchical labels are handled as lists.
[[1]](diffhunk://#diff-aa35f06eaa7d35ff5ffa7077e181ed0b773549c22d42a92e78e076947f9b88f5L776-R817)
[[2]](diffhunk://#diff-aa35f06eaa7d35ff5ffa7077e181ed0b773549c22d42a92e78e076947f9b88f5L803-R900)
[[3]](diffhunk://#diff-aa35f06eaa7d35ff5ffa7077e181ed0b773549c22d42a92e78e076947f9b88f5L862-R949)
* Added helper functions `_attributes_to_dict` and `_has_derived_labels`
for legacy attribute parsing and hierarchy detection.
**API and compatibility changes:**
* Added the `RESTRICTED` value to the `GroupType` enum to support empty
label groups.
* Broadened type imports in `categories.py` for improved type hinting
and compatibility.
**Other codebase updates:**
* Updated test imports in `test_categories.py` to include new
hierarchical classes.
* Simplified Polars conversion logic in `fields.py` for label
serialization.
These changes lay the groundwork for robust hierarchical and multi-label
classification support in the experimental Datumaro API.
### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [ ] I have added tests to cover my changes or documented any manual
tests.
- [ ] I have updated the
[documentation](https://github.com/open-edge-platform/datumaro/tree/develop/docs)
accordingly
---------
Signed-off-by: Albert van Houten <[email protected]>
Co-authored-by: Copilot <[email protected]>1 parent 47e3576 commit 04948e7
File tree
5 files changed
+578
-21
lines changed- src/datumaro/experimental
- tests/unit/experimental
5 files changed
+578
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
61 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
| |||
131 | 136 | | |
132 | 137 | | |
133 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
134 | 322 | | |
135 | 323 | | |
136 | 324 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
532 | 532 | | |
533 | 533 | | |
534 | 534 | | |
535 | | - | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
| 535 | + | |
544 | 536 | | |
545 | 537 | | |
546 | 538 | | |
| |||
0 commit comments