You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm developing a library to build and execute predictive coding networks. One of the main features of PCNs is that they use only local computations (i.e., each layer's forward and backward pass is independent of the others). As a consequence, it is theoretically possible to execute them in parallel (layers could be of any kind, convolutional, linear, etc. etc.). However, even when jitting, it seems that they are still executed sequentially. It would be great to overcome this issue, as the network would train L times faster (where L is the number of layers).
In pseudo-code, what I'm trying to achieve is more or less the same:
I've been trying with very small batch sizes and hidden dimensions (on a RTX TITAN), and the total time of a 'forward_and_backward' step (averaged over an epoch of training) scales linearly with the number of layers, while the GPU utilization can be as low as 10%.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I'm developing a library to build and execute predictive coding networks. One of the main features of PCNs is that they use only local computations (i.e., each layer's forward and backward pass is independent of the others). As a consequence, it is theoretically possible to execute them in parallel (layers could be of any kind, convolutional, linear, etc. etc.). However, even when jitting, it seems that they are still executed sequentially. It would be great to overcome this issue, as the network would train L times faster (where L is the number of layers).
In pseudo-code, what I'm trying to achieve is more or less the same:
I've been trying with very small batch sizes and hidden dimensions (on a RTX TITAN), and the total time of a 'forward_and_backward' step (averaged over an epoch of training) scales linearly with the number of layers, while the GPU utilization can be as low as 10%.
Any help would be much appreciated.
Beta Was this translation helpful? Give feedback.
All reactions