Could it be possible for Dataloader to pass its Dataset to the Batcher? #3580

chrishulbert · 2025-08-19T08:52:32Z

chrishulbert
Aug 19, 2025

Hi, for efficiency when implementing a timeseries LSTM, I am storing the series data as a vec in my Dataset, and keeping my Item's as lean as possible with just an index to the vec. This works fine however i have to fool around a little with Arc to resolve some lifetime issues as I need to duplicate the Dataset and store that in the Batcher.
If Burn was able to pass the Dataset as an argument to Batcher::batch, this would all be far simpler. This could be done easily here:

burn/crates/burn-core/src/data/dataloader/batch.rs

Line 173 in 2d980d7

return Some(self.batcher.batch(items, &self.device));

Would you consider such a change? I understand it may be a breaking change, but it'd be simple enough, batch functions could have the extra param added and unused. I think it would be nice if batchers could conveniently access the dataset and this makes ownership easier. If you think this is a good idea i could write a PR, please let me know! :)

laggui · 2025-08-19T20:18:34Z

laggui
Aug 19, 2025
Maintainer

Hey 👋

Not sure I fully understand the motivation. The current design intentionally separates responsibilities: Dataset handles sample access while Batcher just converts a list of items into a batch (more info in the book).

If you want your item to just hold indices, you could technically still do that by having the dataset in your batcher. For example:

pub struct MyBatcher {
    dataset: MyDataset,
}

impl<B: Backend> Batcher<B, MyItem, MyBatch<B>> for MyBatcher {
    fn batch(&self, items: Vec<MyItem>, device: &B::Device) -> MyBatch<B> {
        // For each index in items: `self.dataset.get(index)`
    }
}

6 replies

chrishulbert Aug 20, 2025
Author

I'm worried that since Item is required to be Clone, if i store data in there instead of the dataset, it will be cloned a lot and waste ram? Am i mistaken? Thanks

laggui Aug 20, 2025
Maintainer

Dataset::get is called lazily, a single item will only be fetched when the batcher needs the item at the specified index to construct a batch. Only the dataset indices are required for the dataloader to fetch batches.

I'm worried that since Item is required to be Clone, if i store data in there instead of the dataset, it will be cloned a lot and waste ram?

Actually, the items are never cloned when returned from the dataset, and neither are they when constructing batches. Items are taken by value. Might be a historical artifact🤔 not sure why it would be required for the current implementation.

P.s., have you seen the WindowsDataset?

chrishulbert Aug 20, 2025
Author

Thanks for explaining, very helpful! Sounds like I should put the data in the item and take that responsibility out of the batcher.
I'll look into WindowsDataset, no I haven't seen it before :)
Thanks again :)

nathanielsimard Aug 21, 2025
Maintainer

I think each item is actually being cloned before being passed to the batcher. The reason is that you might transform the original data with new allocations, so it's easier if we don't have lifetime issues here; it's more flexible. If you have large items in your dataset that are expensive to clone, you could wrap them in Arc.

chrishulbert Aug 21, 2025
Author

Oh, thanks for the heads up. That sounds like something I definitely want to do! Want to squeeze as much performance as possible, and copying heap arrays isn't ideal. I think this is important enough it should be in the guide book; any chance i could make a PR for adding this suggestion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could it be possible for Dataloader to pass its Dataset to the Batcher? #3580

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Could it be possible for Dataloader to pass its Dataset to the Batcher? #3580

Uh oh!

chrishulbert Aug 19, 2025

Replies: 1 comment · 6 replies

Uh oh!

laggui Aug 19, 2025 Maintainer

Uh oh!

chrishulbert Aug 20, 2025 Author

Uh oh!

laggui Aug 20, 2025 Maintainer

Uh oh!

chrishulbert Aug 20, 2025 Author

Uh oh!

nathanielsimard Aug 21, 2025 Maintainer

Uh oh!

chrishulbert Aug 21, 2025 Author

chrishulbert
Aug 19, 2025

Replies: 1 comment 6 replies

laggui
Aug 19, 2025
Maintainer

chrishulbert Aug 20, 2025
Author

laggui Aug 20, 2025
Maintainer

chrishulbert Aug 20, 2025
Author

nathanielsimard Aug 21, 2025
Maintainer

chrishulbert Aug 21, 2025
Author