Slow array programming on GPU compared to CPU #21617

ToshiyukiBandai · 2024-06-04T01:29:41Z

ToshiyukiBandai
Jun 4, 2024

Hi all,

I am writing a PDE solver in JAX and having unsatisfactory performance on GPU. It turned out that one of the problems boils down to the example below. GPU was 30 times slower than CPU for the element wise multiplication of two vectors. Is this because of the data transfer between the host (CPU) and the device (GPU)? Let me know if any of you knows a better way to implement it.

import timeit
import jax
from jax import jit
import jax.numpy as jnp

# platform = 'cpu'
platform = 'gpu'
jax.config.update('jax_platform_name', platform)
print(jax.devices())

N = 101

x = jnp.linspace(0.0, 1.0, N)
y = jnp.linspace(-1.0, 1.0, N)

@jit
def fun(x, y):
    i = jnp.arange(len(x)-1)
    z = x[i]*y[i]
    return z

_ = fun(x, y)

%timeit jax.block_until_ready(fun(x, y)) 

# 162 microseconds for GPU, 5.66 microseconds for CPU for single precision

@jit
def fun2(x, y):
    z = x * y
    return z

_ = fun2(x, y)

%timeit jax.block_until_ready(fun2(x, y)) 

# 164 microseconds for GPU, 5.05 microseconds for CPU for single precision

Answered by jakevdp

Jun 4, 2024

I think you have the wrong mental model of a GPU. You shouldn't think of a GPU as a faster CPU. You should think of a GPU as a whole bunch of really slow CPUs with fast communication and shared memory, which can work together to do large computations in parallel, thereby beating a typical CPU that doesn't have access to such parallelism.

When you do a small computation (like your 100 element-wise multiplications), your problem is not really in a regime where you can benefit from the inherent parallelism of the GPU, and so the CPU will outperform it. On larger problems, you should find that the GPU will out-perform the CPU: for example, if I change your code from N = 101 to N = 10000001, I…

View full answer

jakevdp · 2024-06-04T04:48:17Z

jakevdp
Jun 4, 2024
Maintainer

I think you have the wrong mental model of a GPU. You shouldn't think of a GPU as a faster CPU. You should think of a GPU as a whole bunch of really slow CPUs with fast communication and shared memory, which can work together to do large computations in parallel, thereby beating a typical CPU that doesn't have access to such parallelism.

When you do a small computation (like your 100 element-wise multiplications), your problem is not really in a regime where you can benefit from the inherent parallelism of the GPU, and so the CPU will outperform it. On larger problems, you should find that the GPU will out-perform the CPU: for example, if I change your code from N = 101 to N = 10000001, I find that the benchmark is 100x faster on a T4 GPU than a typical CPU.

1 reply

ToshiyukiBandai Jun 4, 2024
Author

That makes sense. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow array programming on GPU compared to CPU #21617

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Slow array programming on GPU compared to CPU #21617

Uh oh!

ToshiyukiBandai Jun 4, 2024

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

jakevdp Jun 4, 2024 Maintainer

Uh oh!

ToshiyukiBandai Jun 4, 2024 Author

ToshiyukiBandai
Jun 4, 2024

Replies: 1 comment 1 reply

jakevdp
Jun 4, 2024
Maintainer

ToshiyukiBandai Jun 4, 2024
Author