Which is the best way to jit `nonzero` as part of a MSE? #9078

BBischof · 2022-01-02T09:21:18Z

BBischof
Jan 2, 2022

I was working on some Matrix Factorization demos in JAX, and I realized that if I want to use only the observed entries to compute loss, I needed to call nonzero in part of my jit-ed loss function, which requires me to calc the number of nonzeroes... which yields the error:

The size argument of jnp.nonzero must be statically specified to use jnp.nonzero within JAX transformations.

This led to me playing with several different approaches to dealing with this problem and in the end I have about 4.5ish ways to solve the problem.

I discuss and demonstrate in the Colab here.

Question:

Which of these do you think is the most JAX-idiomatic and/or did I miss something obvious?

Additional questions:

Any expectations of performance or dangers in the approaches from the colab?

jakevdp · 2022-01-02T15:44:22Z

jakevdp
Jan 2, 2022
Maintainer

I would probably do this by changing your calculation so that it does not require constructing an explicit array of nonzero values; e.g.

def mse_observed_loss_2(A, params):
  U, V = params['users'], params['items']
  estimator = -(U @ V.T)
  square_err_mat = jnp.multiply(A + estimator, A + estimator)
  nonzero = (A != 0)
  return jnp.where(nonzero, square_err_mat, 0).sum() / nonzero.sum()

jit(mse_observed_loss_2)(a, {'users': u, 'items': v})
# DeviceArray(80.666664, dtype=float32)

If you want a more compact version of this same logic, you can use the where argument of the mean reduction to get the same result:

def mse_observed_loss_3(A, params):
  U, V = params['users'], params['items']
  estimator = -(U @ V.T)
  return jnp.multiply(A + estimator, A + estimator).mean(where=(A != 0))

1 reply

BBischof Jan 2, 2022
Author

Thanks, Jake. I missed where, that's a great callout. Will add to the collection of approaches 😅.

BBischof · 2022-01-03T00:15:57Z

BBischof
Jan 3, 2022
Author

I wanted to report back because I went ahead and did some performance testing (on CPU).

Here's the leaderboard:

4_ML (Sparse with Tree-math): 2.3249481759994524
3_ML (Sparse Representation): 2.7713216399988596
6_ML (mean(where=(A != 0))): 3.2275579659999494
5_ML (where nonzero): 3.3596466340004554
2_ML (Static_argnums): 7.054447947999506
1_ML (Partial): 202.92161694900096

Edit:
I realized that I didn't follow the proper JAX benchmarking advice, so I went and redid it.

1 (Partial):
100 loops, best of 5: 56.5 ms per loop
2 (Static_argnums):
100 loops, best of 5: 72.3 ms per loop
3 (Sparse Representation):
100 loops, best of 5: 17.7 ms per loop
4 (Sparse with Tree-math):
100 loops, best of 5: 18.2 ms per loop
5 (where nonzero):
100 loops, best of 5: 29.1 ms per loop
6 (mean(where=(A != 0))):
100 loops, best of 5: 27.9 ms per loop

Quick comments:

I did this with the MovieLens dataset and two random factorizing matrices. I used 20 embedding dimensions. The reason I chose this test is because ML is such a common benchmark for Matrix Factorization it seemed very appropriate. I chose 20 latent dimensions because it's a reasonable choice and I didn't feel like doing HPO for this.
I didn't test on GPU because I can't figure out how on Colab with this version of JAX? If someone can point me to working docs for this I'd appreciate it (I need to figure this out anyhow, but didn't think this demo warranted the extra effort)
When I was trying to benchmark the static_argnums example, it wouldn't let me jit(grad()) a function with static_argnums because it wasn't hashable. Is there a trick that I dont know about? It's possible the poor performance of method 2 is a red herring.

Everything is still in the same notebook here

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Which is the best way to jit `nonzero` as part of a MSE? #9078

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Which is the best way to jit nonzero as part of a MSE? #9078

Uh oh!

BBischof Jan 2, 2022

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

jakevdp Jan 2, 2022 Maintainer

Uh oh!

BBischof Jan 2, 2022 Author

Uh oh!

Uh oh!

BBischof Jan 3, 2022 Author

Which is the best way to jit `nonzero` as part of a MSE? #9078

BBischof
Jan 2, 2022

Replies: 2 comments 1 reply

jakevdp
Jan 2, 2022
Maintainer

BBischof Jan 2, 2022
Author

BBischof
Jan 3, 2022
Author