Could it be possible for Dataloader to pass its Dataset to the Batcher? #3580
Closed
chrishulbert
started this conversation in
Ideas
Replies: 1 comment 6 replies
-
Hey 👋 Not sure I fully understand the motivation. The current design intentionally separates responsibilities: If you want your item to just hold indices, you could technically still do that by having the dataset in your batcher. For example: pub struct MyBatcher {
dataset: MyDataset,
}
impl<B: Backend> Batcher<B, MyItem, MyBatch<B>> for MyBatcher {
fn batch(&self, items: Vec<MyItem>, device: &B::Device) -> MyBatch<B> {
// For each index in items: `self.dataset.get(index)`
}
} |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, for efficiency when implementing a timeseries LSTM, I am storing the series data as a vec in my Dataset, and keeping my Item's as lean as possible with just an index to the vec. This works fine however i have to fool around a little with Arc to resolve some lifetime issues as I need to duplicate the Dataset and store that in the Batcher.
If Burn was able to pass the Dataset as an argument to Batcher::batch, this would all be far simpler. This could be done easily here:
burn/crates/burn-core/src/data/dataloader/batch.rs
Line 173 in 2d980d7
Would you consider such a change? I understand it may be a breaking change, but it'd be simple enough, batch functions could have the extra param added and unused. I think it would be nice if batchers could conveniently access the dataset and this makes ownership easier. If you think this is a good idea i could write a PR, please let me know! :)
Beta Was this translation helpful? Give feedback.
All reactions