Skip to content
Discussion options

You must be logged in to vote

Thanks for the issue. I think you have two options here:

  1. Do the batching of sentences after the work of PyG Dataloader has been done. For sentences, the PyG DataLoader should simply return a Python list of elements of varying size, on which you can apply sequence_padding afterwards.
  2. Override the collate_fn of the torch.utils.data.DataLoader (similar to what we are doing in PyG as well, see here). In general, you can collate a list of data objects together by simply running Batch.from_data_list. In the collate_fn, you can then do the work of sequence_padding simultaneously.

I think this raises a valid point on how far we allow such things on top of the current PyG DataLoader. Currently, …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by choo0518
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants