Efficient evaluation of an expensive function on masked array inputs #19178

itk22 · 2024-01-03T14:53:08Z

itk22
Jan 3, 2024

I want to use a mask on an array of inputs x, where I apply an expensive function f only to the unmasked entries (where mask is zero), and masked entries are replaced with a fill_value. I found some methods to tackle this using np.where and jax.lax.select, but I have not found a method where the expensive function is not evaluated for the masked inputs (I believe this has to do with the internals of jax.lax.select). A similar issue was raised in #7618

One idea I had was to split the process into 3 operations:

Partition the input based on a mask.
Apply the expensive function to unmasked inputs.
Construct the output array.

Conceptually, the first step would follow something like this:

def partition_by_mask(x, mask):
  return x[mask], x[~mask]

and the final step could follow the approach described in #16962. However, I would like to make the whole process both differentiable and jittable. Would be grateful for any help on this!

Answered by jakevdp

Jan 3, 2024

There is no one-size-fits-all answer here, because it's all about tradeoffs.

Broadly speaking, you have an array of inputs that you would like to map to an array of outputs. You also have a device (GPU or TPU) that's purpose-built for array-oriented computing.

There are two options:

(1) embrace that array-oriented computing, and compute your function for every entry in the array (taking advantage of the implicit array-oriented parallelism in the device architecture) and then mask out the results you don't want. There is wasted computation here because you are computing an expensive result that will be thrown away in some cases, but the benefit is you are using the hardware in precisely th…

View full answer

jakevdp · 2024-01-03T17:05:46Z

jakevdp
Jan 3, 2024
Maintainer

There is no one-size-fits-all answer here, because it's all about tradeoffs.

Broadly speaking, you have an array of inputs that you would like to map to an array of outputs. You also have a device (GPU or TPU) that's purpose-built for array-oriented computing.

There are two options:

(1) embrace that array-oriented computing, and compute your function for every entry in the array (taking advantage of the implicit array-oriented parallelism in the device architecture) and then mask out the results you don't want. There is wasted computation here because you are computing an expensive result that will be thrown away in some cases, but the benefit is you are using the hardware in precisely the way it was designed.

(2) turn away from array-oriented computing, with the goal of avoiding this wasted computation. You could do this via some sort of sequential operation (e.g. stepping through the entries with scan and using cond to decide whether to do your expensive computation) or you could split your entries into two dynamically-sized batches (and pay the cost of data movement and re-compiling your kernel for the dynamic size). In either case, the wasted computation here is in moving the data around, and not taking full advantage of the accelerator architecture.

In most cases, approach (1) will win out, because the benefit of fully utilizing the accelerator architecture typically outweighs the disadvantage of extra computation, and this is partly why this approach is easiest to express in JAX and XLA. In some special cases, (2) will be better. There's no magic bullet, though, and the overhead involved with moving data, recompiling kernels, etc. will be very expensive. But if it's less expensive than the wasted computation in (1), it may be worth it. Still, you'd lose some of the advantages of JAX (e.g. with a dynamic mask size, you'll not be able to use jit or vmap on the full operation).

Does that help answer your question?

1 reply

itk22 Jan 3, 2024
Author

Thanks a lot, @jakevdp. Your answer is very helpful. In my case, it seems that approach (1) is the right way to go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient evaluation of an expensive function on masked array inputs #19178

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Efficient evaluation of an expensive function on masked array inputs #19178

Uh oh!

Uh oh!

itk22 Jan 3, 2024

Replies: 1 comment · 1 reply

Uh oh!

jakevdp Jan 3, 2024 Maintainer

Uh oh!

itk22 Jan 3, 2024 Author

itk22
Jan 3, 2024

Replies: 1 comment 1 reply

jakevdp
Jan 3, 2024
Maintainer

itk22 Jan 3, 2024
Author