How to shape Pytree's correctly for use with and after `vjp`. #9980

joeryjoery · 2022-03-21T16:56:43Z

joeryjoery
Mar 21, 2022

I'm trying to implement a Gauss-Newton vector product abstractly according to vjp(f, hvp(g, vjp(f, primals, tangents))) as the GN matrix for a function composition g o f is written as J_f^T @ H_g @ J_f. This matrix is equal to the hessian of g o f when f is a linear function, or g is at an optimum.

The simplest implementation is:

def hvp(fun, p, t):
    return jax.jvp(jax.jacrev(fun, argnums=0), p, t)


def gvp(inner_fun, outer_fun, p, t):
    y, f_lin = jax.linearize(f, *p)
    dz, Gv = hvp(lambda a: outer_fun(inner_fun(a)), p, t)
    return y, dz, Gv


# Test code
f = lambda x: jnp.sin(x)
g = lambda x: jnp.square(x) / 2
z = lambda x: g(x).sum()


s = jnp.pi / 8
print(gvp(f, g, (s, ), (s, ))[2])
print(gvp(f, z, (s, ), (s, ))[2])
print(gvp(f, g, (jnp.asarray([s]), ), (jnp.asarray([s]), ))[2])
print(gvp(f, z, (jnp.asarray([s]), ), (jnp.asarray([s]), ))[2])
>>> 0.27768016
>>> 0.27768016
>>> [[0.27768016]]
>>> [0.27768016]

# Test with some variable data containers
s = jnp.tile(s, 3)  
print(gvp(f, g, (s, ), (s, ))[2])
print(gvp(f, z, (s, ), (s, ))[2])
print(gvp(f, g, (jnp.asarray([s]), ), (jnp.asarray([s]), ))[2])
print(gvp(f, z, (jnp.asarray([s]), ), (jnp.asarray([s]), ))[2])
>>> [[0.27768016 0.         0.        ]
 [0.         0.27768016 0.        ]
 [0.         0.         0.27768016]]
>>> [0.27768016 0.27768016 0.27768016]
>>> [[[[0.27768016 0.         0.        ]]
  [[0.         0.27768016 0.        ]]
  [[0.         0.         0.27768016]]]]
>>> [[0.27768016 0.27768016 0.27768016]]

It seems that this method almost always works (at least for my testing). However, this is not really efficient. I don't want to call the Hessian of a linearized network when I know this will only contain zeros. Typically, f will be much more expensive to compute whereas g will usually be trivial.

Hence why the composition of vjp - hvp - jvp would be useful. I now have the following for scalars:

def gvp(inner_fun, outer_fun, p, t):
    y, Jt = jax.jvp(inner_fun, p, t)

    dz, HJt = hvp(outer_fun, (y,), (Jt,))

    y, vjp_fun = jax.vjp(inner_fun, *p)
    Gv = vjp_fun(HJt)

    return y, dz, Gv


# Test code
s = jnp.pi / 8
print(gvp(f, g, (s, ), (s, ))[2])
print(gvp(f, z, (s, ), (s, ))[2])
>>> (DeviceArray(0.33518964, dtype=float32, weak_type=True),)
>>> (DeviceArray(0.33518964, dtype=float32, weak_type=True),)

s = jnp.tile(s, 3)
print(gvp(f, g, (s, ), (s, ))[2])  # ValueError: Shape of cotangent input to vjp pullback ...

The first thing that is not going as expected is that the vjp yields the output value inside of a tuple. Now this is easily fixed by Gv[0], however, that only works in this very specific scenario. This would break if f was a function over f: R^2 -> R^2.

The ValueError should be fixed by vmap, however, this requires me to add "hacks" that just make it work on one extra occasion at the cost of additional problems:

def gvp_vmapped(inner_fun, outer_fun, p, t):
    y, Jt = jax.jvp(inner_fun, p, t)

    dz, HJt = hvp(outer_fun, (y,), (Jt,))
    
    HJt = jax.tree_map(jnp.atleast_1d, HJt)  # vmap hack
    y, vjp_fun = jax.vjp(inner_fun, *p)
    Gv = vmap(vjp_fun)(*jax.tree_leaves(HJt))
    
    return y, dz, Gv


# Test code
s = jnp.pi / 8
print(gvp_vmapped(f, g, (s, ), (s, ))[2])
print(gvp_vmapped(f, z, (s, ), (s, ))[2])
>>> (DeviceArray([0.33518964], dtype=float32, weak_type=True),)
>>> (DeviceArray([0.33518964], dtype=float32, weak_type=True),)

s = jnp.tile(s, 3)
print(gvp_vmapped(f, g, (s, ), (s, ))[2])
>>> (DeviceArray([[0.33518964, 0.        , 0.        ],
             [0.        , 0.33518964, 0.        ],
             [0.        , 0.        , 0.33518964]],            dtype=float32, weak_type=True),)
print(gvp_vmapped(f, z, (s, ), (s, ))[2])  # ValueError: Shape of cotangent input to vjp ...

Now, multiple things are going wrong... The scalar functions yield a result that is now a (1,) array inside of a tuple, and the atleast_1d hack is not general enough to always work. Calling jnp.squeeze also doesn't work, because this would also remove dimensions that are correctly added (see the behaviour of the first hvp).

So I'm asking for any guidance or tips on how to correctly manipulate the shapes and sizes, such that my custom gvp function behaves as the hvp of a linearized function...

Answered by YouJiacheng

Mar 22, 2022

@joeryjoery Emmm, I don't know why we need a nested tree_map here. (Assume: pack all arguments into single pytree).

import jax
import jax.numpy as jnp

def nested_vmap(fun, n: int):
    for _ in range(n):
        fun = jax.vmap(fun)
    return fun

def gvp(inner_fun, outer_fun, p_in, t_in):
    # p_in: pytree_0
    # t_in: pytree_1
    # inner_fun: pytree_1 -> pytree_2
    # outer_fun: pytree_1 -> pytree_2
    p_out, f_l = jax.linearize(inner_fun, p_in) # (pytree_1), (pytree_0 -> pytree_1)
    f_lt_tuple = jax.linear_transpose(f_l, p_in) # pytree_1 -> pytree_0
    f_lt = lambda x: f_lt_tuple(x)[0] # primals tuple only contain one primal
    Jt = f_l(t_in) # pytree_1
    d_outer, HJt = jax.j…

View full answer

YouJiacheng · 2022-03-21T18:49:11Z

YouJiacheng
Mar 21, 2022

def gvp(inner_fun, outer_fun, p, t):
    y, f_lin = jax.linearize(f, *p)
    dz, Gv = hvp(lambda a: outer_fun(inner_fun(a)), p, t)
    return y, dz, Gv

I think it should be

def gvp(inner_fun, outer_fun, p, t):
    y, f_lin = jax.linearize(inner_fun, *p)
    dz, Gv = hvp(lambda a: outer_fun(y + f_lin(jax.tree_map(lambda x, y: x - y, a, p))), p, t)
    # or hvp(lambda a: outer_fun(y + f_lin(a)), jax.tree_map(lambda x: jnp.zeros_like(x), p), t)
    return y, dz, Gv

otherwise gvp is not a Gauss-Newton vector product, but just a hvp of g o f

Emmm, are you sure it is not efficient?
I'll try to find out a method.

10 replies

YouJiacheng Mar 22, 2022

For pytree valued g, we need tree_map + vmap. Hmmm.

joeryjoery Mar 22, 2022
Author

Hey, thanks a lot for your help.

At the moment, the best I was able to get is:

def my_gvp(inner_fun, outer_fun, p, t):
    y, f_lin = jax.linearize(inner_fun, *p)
    z = outer_fun(y)
    
    f_lin_T = jax.linear_transpose(f_lin, *p)  # y-like to p-like
    dz, HJt = hvp(outer_fun, (y,), (f_lin(*t),))
    
    if not all(jax.tree_leaves(jax.tree_map(jnp.isscalar, HJt))):
        # Reshape HJt to be compatible with VJP function.
        HJt = jax.tree_multimap(lambda a, b: jnp.reshape(a, (-1, *jnp.shape(b))), HJt, y)
        Gt = vmap(f_lin_T)(HJt) 
        # Cast vmap + output dimensions to desired shapes 
        Gt = jax.tree_map(
            lambda x: jax.tree_multimap(lambda a, b, c: jnp.reshape(a, (*jnp.shape(c), *jnp.shape(b))), 
                              jax.tree_leaves(x), jax.tree_leaves(y), jax.tree_leaves(z)),
            Gt
        )
    else:
        Gt = jax.tree_leaves(f_lin_T(HJt))
        
    # Unpack GVP result back to primal shape.
    Gt = jax.tree_unflatten(jax.tree_structure(p), jax.tree_leaves(Gt))
    if len(p) == 1: Gt = Gt[0]

    return y, dz, Gt

It feels a bit hacky due to the explicit reshaping of HJt and Gt, but for my small testing suite, it does behave as expected.
Also, I define the hvp function as:

def hvp(fun, p, t):
    a = 0 if len(p) == 1 else jnp.arange(len(p))
    return jax.jvp(jax.jacrev(fun, argnums=a), p, t)

as the argnums can omit the primals if p is multivariate.

joeryjoery Mar 22, 2022
Author

I don't really have nice testing functions for this, but the function I posted in my last reply already doesn't behave as desired when the shapes of (y, z) -> (1, 1) when the primals are ()...

joeryjoery Mar 22, 2022
Author

Essentially, I guess that my method would work if the line:

Gt = jax.tree_map(
    lambda x: jax.tree_multimap(lambda a, b, c: jnp.reshape(a, (*jnp.shape(c), *jnp.shape(b))), 
                      jax.tree_leaves(x), jax.tree_leaves(y), jax.tree_leaves(z)),
    Gt
)

would replace y with primals, but that's quite difficult due to tree mismatches.

Gt = jax.tree_map(
    lambda x: jax.tree_multimap(lambda a, b, c: jnp.reshape(a, (*jnp.shape(c), *jnp.shape(b))), 
                      jax.tree_leaves(x), primals, jax.tree_leaves(z)),
    Gt
)  # ValueError

joeryjoery Mar 22, 2022
Author

Last hack to make it work seemingly well (at least for my unit tests):

Gt = jax.tree_multimap(
            lambda a, b: jax.tree_multimap(lambda a, b, c: jnp.reshape(a, (*jnp.shape(c), *jnp.shape(b))),
                                           jax.tree_leaves(a), jax.tree_leaves(b), jax.tree_leaves(z)),
            Gt, jax.tree_map(lambda x: [x], primals)
        )

I did find out that this can also break if g = lambda *a: (g_func(*a), ) with or without an aux bool. Guess I'll need to live with that lol.

YouJiacheng · 2022-03-22T10:31:48Z

YouJiacheng
Mar 22, 2022

@joeryjoery Emmm, I don't know why we need a nested tree_map here. (Assume: pack all arguments into single pytree).

import jax
import jax.numpy as jnp

def nested_vmap(fun, n: int):
    for _ in range(n):
        fun = jax.vmap(fun)
    return fun

def gvp(inner_fun, outer_fun, p_in, t_in):
    # p_in: pytree_0
    # t_in: pytree_1
    # inner_fun: pytree_1 -> pytree_2
    # outer_fun: pytree_1 -> pytree_2
    p_out, f_l = jax.linearize(inner_fun, p_in) # (pytree_1), (pytree_0 -> pytree_1)
    f_lt_tuple = jax.linear_transpose(f_l, p_in) # pytree_1 -> pytree_0
    f_lt = lambda x: f_lt_tuple(x)[0] # primals tuple only contain one primal
    Jt = f_l(t_in) # pytree_1
    d_outer, HJt = jax.jvp(jax.jacrev(outer_fun, argnums=0), (p_out,), (Jt,))
    # pytree_2(pytree_1), pytree_2(pytree_1) with prepended shape leaves
    shapes = jax.eval_shape(outer_fun, p_out) # pytree_2
    Gt = jax.tree_map(lambda s, h: nested_vmap(f_lt, len(s.shape))(h), shapes, HJt) # h: pytree_1
    return p_out, d_outer, Gt

def f(x):
    return x

def g(x):
    return jax.tree_map(lambda x: x ** 2, x)

x = (jnp.ones((2,2)), jnp.ones((2,2)))
print(gvp(f, g, x, x)[2])

Output:

((DeviceArray([[[[2., 0.],
               [0., 0.]],

              [[0., 2.],
               [0., 0.]]],


             [[[0., 0.],
               [2., 0.]],

              [[0., 0.],
               [0., 2.]]]], dtype=float32), 
DeviceArray([[[[0., 0.],
               [0., 0.]],

              [[0., 0.],
               [0., 0.]]],


             [[[0., 0.],
               [0., 0.]],

              [[0., 0.],
               [0., 0.]]]], dtype=float32)), 
(DeviceArray([[[[0., 0.],
               [0., 0.]],

              [[0., 0.],
               [0., 0.]]],


             [[[0., 0.],
               [0., 0.]],

              [[0., 0.],
               [0., 0.]]]], dtype=float32), 
DeviceArray([[[[2., 0.],
               [0., 0.]],

              [[0., 2.],
               [0., 0.]]],


             [[[0., 0.],
               [2., 0.]],

              [[0., 0.],
               [0., 2.]]]], dtype=float32)))

4 replies

YouJiacheng Mar 22, 2022

@joeryjoery WDYT?

YouJiacheng Mar 22, 2022

minor fix for linear_transpose(f_l) actually returns a tuple like primals. (here primals = (primal,))

joeryjoery Mar 22, 2022
Author

Hey, thanks so much. I think this is actually it. For all my tests so far this seem to work correctly (the same as HVP over g o f_lin).

So, the "magic" happens in the tree-mapped nested vmap functions. It does exactly what I wanted: correctly mapping the output dimensions without manual reshaping.

If I'm not asking too much, can I ask a couple of things about this solution:

What does the nested vmap "map" over, is that each input dimension contained in HJt?
Is it neccesary to keep the arguments in tuples? From small testing this also seems to work fine (useful for multivariate functions):

def my_gvp(inner_fun, outer_fun, p_in, t_in):
    # p_in: pytree_0
    # t_in: pytree_1
    # inner_fun: pytree_1 -> pytree_2
    # outer_fun: pytree_1 -> pytree_2
    p_out, f_l = jax.linearize(inner_fun, *p_in) # (pytree_1), (pytree_0 -> pytree_1)
    f_lt = jax.linear_transpose(f_l, *p_in) # pytree_1 -> pytree_0
    Jt = f_l(*t_in) # pytree_1
    argnums = 0 if len(p_out) == 1 else jnp.arange(len(p_out))
    d_outer, HJt = jax.jvp(jax.jacrev(outer_fun, argnums=argnums), (p_out,), (Jt,))
    # pytree_2(pytree_1), pytree_2(pytree_1) with prepended shape leaves
    shapes = jax.eval_shape(outer_fun, p_out) # pytree_2
    Gt = jax.tree_map(lambda s, h: nested_vmap(f_lt, len(jnp.shape(s)))(h), shapes, HJt) # h: pytree_1
   
    if len(p_in) == 1: Gt = Gt[0]  # Unpack for univariate inner function inputs.
    return p_out, d_outer, Gt

YouJiacheng Mar 22, 2022

nested vmap map over the dimensions of outer_fun ouput leaves, which are prepended to HJt leaves during jacrev. See also How to shape Pytree's correctly for use with and after `vjp`. #9980 (reply in thread)
Not necessary. I just pack all arguments into a single pytree to simplify analysis and get rid of branches. I think doing so can make the code more maintainable. In addition, if inner_fun can be multivariate while outer_fun can't, it can cause confusion and mistake. Moreover, it is extremely easy to convert a multivariate function to a single-pytree function without performance loss in JAX:

lambda args: f(*args)

How to shape Pytree's correctly for use with and after vjp. #9980

Uh oh!

Uh oh!

joeryjoery Mar 21, 2022

Replies: 2 comments · 14 replies

Uh oh!

Uh oh!

YouJiacheng Mar 21, 2022

Uh oh!

YouJiacheng Mar 22, 2022

Uh oh!

joeryjoery Mar 22, 2022 Author

Uh oh!

joeryjoery Mar 22, 2022 Author

Uh oh!

joeryjoery Mar 22, 2022 Author

Uh oh!

joeryjoery Mar 22, 2022 Author

Uh oh!

Uh oh!

YouJiacheng Mar 22, 2022

Uh oh!

YouJiacheng Mar 22, 2022

Uh oh!

YouJiacheng Mar 22, 2022

Uh oh!

joeryjoery Mar 22, 2022 Author

Uh oh!

Uh oh!

YouJiacheng Mar 22, 2022

How to shape Pytree's correctly for use with and after `vjp`. #9980

joeryjoery
Mar 21, 2022

Replies: 2 comments 14 replies

YouJiacheng
Mar 21, 2022

joeryjoery Mar 22, 2022
Author

joeryjoery Mar 22, 2022
Author

joeryjoery Mar 22, 2022
Author

joeryjoery Mar 22, 2022
Author

YouJiacheng
Mar 22, 2022

joeryjoery Mar 22, 2022
Author