Batch.from_data_list() causing bottleneck - any ideas? #3645

gpdwatkins · 2021-12-07T11:13:42Z

gpdwatkins
Dec 7, 2021

Hi, I'm using pytorch geometric for reinforcement learning (I'm just using standard DQN) and my biggest computational bottleneck is coming from constructing the batches using Batch.from_data_list(), as shown in the profiling below.

Other than reducing the batch size, I'm looking for ways of improving the efficiency of this step.

When sampling experiences to construct the batch, an experience contains a 'state' and a 'next state' (both of type Data). I construct two separate Batches - one using a list of 'states' and one using a list of 'next states' (note that the states aren't from consecutive experiences so the 'states' and 'next states' lists aren't just offset by 1; they're completely unrelated).

One idea I had - I could construct a single Data object using more keys (x_state, edge_index_state, edge_attr_state, x_next_state, edge_index_next_state, edge_attr_next_state). This would then require only one call to from_data_list(), but with more keys the function might simply take twice as long to run.

Any thoughts about how to speed this up (or even if it's possible) would be much appreciated!

rusty1s · 2021-12-07T12:09:01Z

rusty1s
Dec 7, 2021
Maintainer

I always thought that Batch.from_data_list() is pretty fast :) Usually, the speed of Batch.from_data_list() is not a problem, as we can even make use of multiple workers in DataLoader to circumvent this. As far as I can tell, using numerous worker does not seem possible for you, correct?

It's hard for me to tell how we can improve runtime (all we are doing is collecting lists of attributes and using torch.cat), but runtime certainly depends on the number of attributes+batch_size, so using two Data objects with less attributes should be as efficient as having a single Data object with twice as many attributes.

Do you have a small example to reproduce the in-efficiency in your case? I'm happy to look into any bottlenecks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch.from_data_list() causing bottleneck - any ideas? #3645

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Batch.from_data_list() causing bottleneck - any ideas? #3645

Uh oh!

gpdwatkins Dec 7, 2021

Replies: 1 comment

Uh oh!

rusty1s Dec 7, 2021 Maintainer

gpdwatkins
Dec 7, 2021

rusty1s
Dec 7, 2021
Maintainer