HDF5 dataloaders using a python generator

Any dataloaders that use the `H5Generator` are effectively singlethreaded and cpu-bound:

https://github.com/tue-bmd/zea/blob/2a3a9a62755af26e30769357b5e95a60df3ed393/zea/data/dataloader.py#L376

Either I am completely misunderstanding the situation, but as far as I know:

```
tf.data.Dataset.from_generator(image_extractor, ...)
```

Means that __next__() runs in Python. This gives the pipeline: 

```
GPU waits → Python → h5py → Python → TF → GPU
```

and `num_workers`, `AUTOTUNE`, `prefetch`, and `batch`  have no effect in speeding this up.

Currently, I'm training an RF-data VAE that's taking 7s to load a batch, and 120ms to do the forward and backward pass which is why I cannot use the zea dataloader.

---

I understand that fixing this is quite a large task as there are many dependencies, but it would be good to look into at some point.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDF5 dataloaders using a python generator #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HDF5 dataloaders using a python generator #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions