-
Hi, I've been working with PNAConv lately but having some difficulties in understanding how towers work in PNAConv. I've read the PNAConv paper and also the "Neural Message Passing for Quantum Chemistry" paper where the tower concept was taken. But at this point, I'm still unsure how it actually works. The only thing I know is that it help for faster training but doesn't really know the background of it. I'm asking this question because I need to reduce the number of towers so that it didn't use so much memory on my GPU. I've been dealing with Job Shop Scheduling Problem's Disjuncitive Graph of size bigger than 6x6 which has 38 nodes and 222 edges. Besides number of tower, I also needed to change parameters such as number of layers and hidden dimension of PNAConv. Otherwise I always ran out of memory in my GPU. The biggest question is does the number of tower affect the performance of PNAConv badly? If yes what parameters should I change first before reducing the number of towers? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
I'm curious to understand why you have GPU memory problems when operating on a graph with around 40 nodes :) The |
Beta Was this translation helpful? Give feedback.
I'm curious to understand why you have GPU memory problems when operating on a graph with around 40 nodes :)
The
tower
argument is similar to thegroups
argument intorch.nn.Conv2d
, where you subdivide your number of features into groups, and each group is solely transformed based on the features inside the same group. This will reduce the number of parameters fromin_channels * out_channels
tonum_groups * (in_channels/num_groups) * (out_channels/num_groups)
and, as a result, might prevent overfitting. I personally don't think that thetower
size is highly sensible to model performance.