Selective Multi-task Gradient #9989

linyuhongg · 2022-03-22T14:00:36Z

linyuhongg
Mar 22, 2022

Say I have a backbone and two heads, and I only want to backpropagate gradient from one of those two heads. A straightforward way to do this in PyTorch is simply to forward the backbone and detach its output in one of the two heads.

However, I cannot think of any efficient way to do this in JAX, without having to forward twice the backbone, one time for each head

Answered by YouJiacheng

Mar 22, 2022

You can use jax.lax.stop_gradient as JAX's detach.
Note that will either need to compile train_step function twice with static args indicate the selection, or need to perform an all zero bp.

@partial(jax.jit, static_argnums=0)
def train_step(stop_grad_head0, params, x, y0, y1):
    def f(params):
        z = backbone(params['backbone'], x)
        h0 = head0(params['head0'], z)
        h1 = head1(params['head1'], z)
        if stop_grad_head0:
            return loss_fn(y0, y1, jax.lax.stop_gradient(h0), h1)
        return loss_fn(y0, y1, h0, jax.lax.stop_gradient(h1)) # stop grad head 1
    loss, grad = jax.value_and_grad(f)(params)
    return loss, apply_grad(params, grad)

def f(params, k…

View full answer

YouJiacheng · 2022-03-22T15:03:56Z

YouJiacheng
Mar 22, 2022

You can use jax.lax.stop_gradient as JAX's detach.
Note that will either need to compile train_step function twice with static args indicate the selection, or need to perform an all zero bp.

@partial(jax.jit, static_argnums=0)
def train_step(stop_grad_head0, params, x, y0, y1):
    def f(params):
        z = backbone(params['backbone'], x)
        h0 = head0(params['head0'], z)
        h1 = head1(params['head1'], z)
        if stop_grad_head0:
            return loss_fn(y0, y1, jax.lax.stop_gradient(h0), h1)
        return loss_fn(y0, y1, h0, jax.lax.stop_gradient(h1)) # stop grad head 1
    loss, grad = jax.value_and_grad(f)(params)
    return loss, apply_grad(params, grad)

def f(params, k: float):
    # k \in [0, 1]
    z = backbone(params['backbone'], x)
    h0 = head0(params['head0'], z)
    h0 = (1 - k) * h0 + k * jax.lax.stop_gradient(h0)
    h1 = head1(params['head1'], z)
    h1 = k * h1 + (1 - k) * jax.lax.stop_gradient(h1)
    return loss_fn(y0, y1, h0, h1)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Selective Multi-task Gradient #9989

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Selective Multi-task Gradient #9989

Uh oh!

linyuhongg Mar 22, 2022

Replies: 1 comment

Uh oh!

Uh oh!

YouJiacheng Mar 22, 2022

linyuhongg
Mar 22, 2022

YouJiacheng
Mar 22, 2022