Skip to content

Efficient way to integrate lossyless into a PyTorch Dataset subclass #40

@lbhm

Description

@lbhm

Hey @YannDubs,

I recently discovered your paper and find the idea very interesting. Therefore, I would like to integrate lossyless into a project I am currently working on. However, there are two requirements/presuppositions in my project that your compressor on PyTorch Hub does not cover as far as I understand it:

  • I assume that the training data do not fit into memory so I cannot decompress the entire dataset at once.
  • Because I cannot load the entire data into memory and shuffle them there, I need access to individual samples of the dataset (for random permutations) without touching the rest of the data (or as little as possible).

Basically, I would like to integrate lossyless into a subclass of PyTorch's Dataset that implements the __getitem__(index) interface. Before I start experimenting on my own and potentially overlook something that you already thought about, I wanted to ask you if you already considered approaches how to integrate your idea into a PyTorch Dataset.

Looking forward to a discussion!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions