Efficient ways to only sum over tril part of a point-wise scalar-value function(R^d -> R) evaluates on a n×n×d tensor. #9692

YouJiacheng · 2022-02-25T07:59:13Z

YouJiacheng
Feb 25, 2022

Update:
n×n×d tensor is produced by (self) pair-wise difference of a n×d tensor(actually coordinates of n points, and d typically ≤3). That is why I only need the tril/triu part.
I am trying to implement fast multipole method in JAX, any help will be appreciated. 🥰
In detail, there are 2 tasks:
Task A:

n = 1000 # typical value 100~1000
d = 3

@partial(jax.vmap, in_axes=(0, None))
@partial(jax.vmap, in_axes=(None, 0))
def f(x, y):
    return jax.lax.rsqrt(jnp.sum((x - y) ** 2))

# electrostatic energy in 3d
def g(xs):
    assert xs.shape == (n, d)
    return jnp.sum(jnp.tril(f(xs, xs), -1))

Task B:

n = 1000
d = 1

@partial(jax.vmap, in_axes=(0, None))
@partial(jax.vmap, in_axes=(None, 0))
def f(x, y):
    return jax.lax.log(jnp.sum((x - y) ** 2))

# electrostatic energy in 2d, but points are restricted to x axis.
def g(xs):
    assert xs.shape == (n, d)
    return jnp.sum(jnp.tril(f(xs, xs), -1))

Here is my use case and some implementations.
use jnp.tril after point-wise function maybe faster with trivial point-wise function since lax.select is faster than lax.gather or lax.scatter, but the redundant computation can be significant if point-wise function become more complex. (Actually sum over last dim is complex enough to make post_tril slower)
The efficiency of gradient evaluation should be considered as well.
Especially, pre_mask and pre_idx jit compilation time is painfully long if n >= 5000.

def point_wise_f(x):
    # maybe a mlp in practice
    # act on the last dim in a point-wise manner
    # out.shape == x.shape[:-1]
    return jnp.sum(x, axis=-1)

@jax.jit
def pre_mask(x: jnp.ndarray): 
    # x.shape == (n, n, d)
    with jax.ensure_compile_time_eval():
        mask = jnp.tri(*x.shape[:2], k=-1, dtype=bool)
    return jnp.sum(point_wise_f(x[mask, :]))

@jax.jit
def pre_idx(x: jnp.ndarray): 
    # x.shape == (n, n, d)
    n = x.shape[0]
    return jnp.sum(point_wise_f(x[jnp.tril_indices(n, -1, n)]))

@jax.jit
def post_tril(x: jnp.ndarray): 
    # x.shape == (n, n, d)
    return jnp.sum(jnp.tril(point_wise_f(x), -1))

YouJiacheng · 2022-03-07T05:04:18Z

YouJiacheng
Mar 7, 2022
Author

😭

3 replies

mattjj Mar 14, 2022
Maintainer

Sorry for being slow to respond. We really appreciate your activity on this issue tracker!

(However, the last 2-3 weeks have been "performance review" time at Google, so a lot of our time has been pulled into administrative overheads, and we've been especially slow...)

Which backend are you using (CPU/GPU/TPU)?

Another approach may be to apply a convolution.

mattjj Mar 14, 2022
Maintainer

Can you share a full runnable example that exhibits what you have in mind (e.g. for representative sizes/shapes)?

YouJiacheng Mar 14, 2022
Author

@mattjj Thanks! I have updated my question with the whole task(not only an examplle).
I mainly use GPU backend and n=100-1000, d=1-3 (will do this computation for a large batch).
I think the main consideration should be the complexity of point_wise_f.
I originally need to use a MLP as point_wise_f, in which post_tril nearly take 2x time.
But now I only need to perform simple computation such as jnp.log and jnp.abs.
And I am trying to make use of fast multipole method to reduce the complexity from O(n^2) to O(n), since the n×n×d tensor is actually produced by pair-wise difference of n×d tensor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient ways to only sum over tril part of a point-wise scalar-value function(R^d -> R) evaluates on a n×n×d tensor. #9692

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Efficient ways to only sum over tril part of a point-wise scalar-value function(R^d -> R) evaluates on a n×n×d tensor. #9692

Uh oh!

Uh oh!

YouJiacheng Feb 25, 2022

Replies: 1 comment · 3 replies

Uh oh!

YouJiacheng Mar 7, 2022 Author

Uh oh!

mattjj Mar 14, 2022 Maintainer

Uh oh!

mattjj Mar 14, 2022 Maintainer

Uh oh!

Uh oh!

YouJiacheng Mar 14, 2022 Author

YouJiacheng
Feb 25, 2022

Replies: 1 comment 3 replies

YouJiacheng
Mar 7, 2022
Author

mattjj Mar 14, 2022
Maintainer

mattjj Mar 14, 2022
Maintainer

YouJiacheng Mar 14, 2022
Author