pmap for a function containing large, static arrays #7200
Replies: 1 comment 1 reply
-
Have you tried
|
Beta Was this translation helpful? Give feedback.
-
Have you tried
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm curious what the best way to parallelize evaluation of a function containing large, static arrays.
I need to evaluate
where Y, W, X are all large, but static, arrays. Each is of size 512x512. Only the parameter
theta
needs to be changed. I need to evaluate manythetas
-- roughly 1000^2.Option 1: Run on a single device.
This results in GPU utilization ~60%.
Option 2: Parallelize over multiple devices.
I wrote a simple generator to return thetas in batches of 8, which is the number of GPUs on this machine. Then
This gives the same answer as Option 1, but at roughly the same speed. Average GPU utilization is closer to ~5% on each of the GPUs.
I'm guessing that the utilization is low on the
pmap
version as the arrays Y, W, X are constantly being transfered to the respective devices. I'm not sure how to test that, though.Could there be a way to avoid this using
device_put_replicated
? Or, more generally, what's the best way to evaluate a function like this using multiple devices?`
Beta Was this translation helpful? Give feedback.
All reactions