Efficient way to integrate lossyless into a PyTorch Dataset subclass

Hey @YannDubs,

I recently discovered your paper and find the idea very interesting. Therefore, I would like to integrate `lossyless` into a project I am currently working on. However, there are two requirements/presuppositions in my project that your compressor on PyTorch Hub does not cover as far as I understand it:
- I assume that the training data do not fit into memory so I cannot decompress the entire dataset at once.
- Because I cannot load the entire data into memory and shuffle them there, I need access to individual samples of the dataset (for random permutations) without touching the rest of the data (or as little as possible).

Basically, I would like to integrate `lossyless` into a subclass of PyTorch's `Dataset` that implements the `__getitem__(index)` interface. Before I start experimenting on my own and potentially overlook something that you already thought about, I wanted to ask you if you already considered approaches how to integrate your idea into a PyTorch `Dataset`.

Looking forward to a discussion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient way to integrate lossyless into a PyTorch Dataset subclass #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Efficient way to integrate lossyless into a PyTorch Dataset subclass #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions