How to achieve (near) peak FLOPS with jax matmul on TPUv4? #21172

hirayaku · 2024-05-10T19:09:54Z

hirayaku
May 10, 2024

I am stressing out the performance of matmul-like kernels on 1x TPUv4. However, I can't push the performance above 50% of the peak FLOPS (275 TFLOPS) as specified in the paper.

The following code shows the measurement with jax.lax.dot.

import jax
import jax.numpy as jnp

device = jax.default_device()
print(f"Jax default backend: {jax.default_backend()}")

key = jax.random.key(0)
x_bf16 = jax.random.uniform(key, (2**16, 4096), dtype=jnp.bfloat16)
y_bf16 = jax.random.uniform(key, (4096, 4096), dtype=jnp.bfloat16)
dot = jax.jit(jax.lax.dot) # .lower(x_bf16, y_bf16).compile()
dot(x_bf16, y_bf16).block_until_ready()

%timeit dot(x_bf16, y_bf16).block_until_ready()

Output:

Jax default backend: tpu
8.4 ms ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

which gives a throughput of only 2**16 * 4096 * 4096 / 8.4e-3 = 131 TFLOPS.

Is it due to the code not using the other processor on TPUv4? If it is, how should I make it to?

Answered by hawkinsp

May 10, 2024

Are you missing a factor of 2 in your flop count? i.e., for an [M, N] by [N, K] matmul you need MNK fused multiply-adds, which is 2MNK flops since each FMA is two ops.

View full answer

hawkinsp · 2024-05-10T19:53:54Z

hawkinsp
May 10, 2024
Maintainer

Are you missing a factor of 2 in your flop count? i.e., for an [M, N] by [N, K] matmul you need MNK fused multiply-adds, which is 2MNK flops since each FMA is two ops.

1 reply

hirayaku May 10, 2024
Author

Ah, yes, I can't believe I made this mistake again. You are right, I should multiply the results by two which is pretty close to the peak. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to achieve (near) peak FLOPS with jax matmul on TPUv4? #21172

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to achieve (near) peak FLOPS with jax matmul on TPUv4? #21172

Uh oh!

Uh oh!

hirayaku May 10, 2024

Replies: 1 comment · 1 reply

Uh oh!

hawkinsp May 10, 2024 Maintainer

Uh oh!

Uh oh!

hirayaku May 10, 2024 Author

hirayaku
May 10, 2024

Replies: 1 comment 1 reply

hawkinsp
May 10, 2024
Maintainer

hirayaku May 10, 2024
Author