How can i set some modules to stop learning when training in jax? #10178

maobenz · 2022-04-07T09:16:14Z

maobenz
Apr 7, 2022

Just like pytorch, we can set one specific module to stop the gradient to learn by setting the require_grad=False.
Is there any way or functions to set it in Jax?
I have searched the jax.lax.stop_gradient, can it be used to stop the module to learn? It seems that it is used to stop the gradient of one specific parameter.

The module is like this

class XXX(nn.Module):

Thanks

Answered by YouJiacheng

Apr 7, 2022

Option 1

model_output = model.apply(params, model_input)
detached_model_output = jax.lax.stop_gradient(model_output)

Option 2

detached_params = jax.lax.stop_gradient(params)
model_output = model.apply(detached_params , model_input)

Note that params can be a pytree, i.e. nested dict/tuple/list or custom pytree type.
This is suitable for flax and haiku: init will give you params as a pytree, apply will request a pytree params.
Option 3
Use https://github.com/patrick-kidger/equinox, then your model will be a pytree.

View full answer

YouJiacheng · 2022-04-07T13:21:00Z

YouJiacheng
Apr 7, 2022

Option 1

model_output = model.apply(params, model_input)
detached_model_output = jax.lax.stop_gradient(model_output)

Option 2

detached_params = jax.lax.stop_gradient(params)
model_output = model.apply(detached_params , model_input)

Note that params can be a pytree, i.e. nested dict/tuple/list or custom pytree type.
This is suitable for flax and haiku: init will give you params as a pytree, apply will request a pytree params.
Option 3
Use https://github.com/patrick-kidger/equinox, then your model will be a pytree.

4 replies

maobenz Apr 7, 2022
Author

Thanks!
So you mean if i use the stop_gradient function to the output of a model, then all paramters of the model will be stopped to learn, right?
Do you know how to print the parameters of a network to check?
Thanks again!

YouJiacheng Apr 7, 2022

Not really "use the stop_gradient function to the output of a model, then all paramters of the model will be stopped to learn", it actually backprop zero gradient through stop_gradient. If you want gradient w.r.t. input, you should not use this method.
You can simply print the grad.

grad, loss = jax.value_and_grad(loss_fn)(params)
print(grad)

maobenz Apr 8, 2022
Author

Thanks a lot!
However, i don't think stop_gradient can help solve my problem.
My model is like

   class XXX(nn.Module):

   class A(nn.Module):

   class B(nn.Module):

   def __call_:
          a = A(input)
          b = B(a)
          return b

I want to stop the gradient of model B at the beginning and after some iterations make model B to learn and stop model A to learn.
Maybe I just use the jax.lax.stop(b) and jax.lax.stop(a) don't help
Thanks!

YouJiacheng Apr 8, 2022

I think it is flax/haiku's limitation: you can only access the params of a module inside this module, or outside all modules.
You can try https://github.com/patrick-kidger/equinox, which treat module as pytree and provide a pytorch-like API:

model_not_requires_grad = jax.lax.stop_gradient(model)

I suggest that you can manually zero out the gradient of one model:

grad, loss = jax.value_and_grad(loss_fn)(params)
print(grad)

you can see grad is a nested dict, and you can zero out any part of it.

Another solution is not make XXX a module, just use model A and model B in trainning.

def train_fn(A_params, B_params, x, y):
    a = model_A.apply({'params': A_params}, x)
    b = model_B.apply({'params': B_params}, a)
    return loss_fn(b, y)

def train_fn_A_learn(A_params, B_params, x, y):
    return train_fn(A_params, jax.lax.stop_gradient(B_params), x, y)

def train_fn_B_learn(A_params, B_params, x, y):
    return train_fn(jax.lax.stop_gradient(A_params), B_params, x, y)

YouJiacheng · 2022-04-08T09:10:17Z

YouJiacheng
Apr 8, 2022

And you may refer to
google/flax#1706

0 replies

ayaka14732 · 2022-04-08T09:48:02Z

ayaka14732
Apr 8, 2022

Use optax.set_to_zero together with optax.multi_transform.

params = {
    'a': { 'x1': ..., 'x2': ... },
    'b': { 'x1': ..., 'x2': ... },
}

param_labels = {
    'a': { 'x1': 'freeze', 'x2': 'train' },
    'b': 'train',
}

optimizer_scheme = {
    'train': optax.adam(...),
    'freeze': optax.set_to_zero(),
}

optimizer = optax.multi_transform(optimizer_scheme, param_labels)

See Freeze Parameters Example for details.

- Taken from ayaka14732/tpu-starter, the 'Freeze certain model parameters' section.

0 replies

patrick-kidger · 2022-04-08T10:12:55Z

patrick-kidger
Apr 8, 2022

If following @YouJiacheng's "Option 3" above and using Equinox as your neural network library (instead of Flax), then the docs for Equinox already include an example for handling frozen layers here.

0 replies

How can i set some modules to stop learning when training in jax? #10178

Uh oh!

maobenz Apr 7, 2022

Replies: 4 comments · 4 replies

Uh oh!

YouJiacheng Apr 7, 2022

Uh oh!

maobenz Apr 7, 2022 Author

Uh oh!

Uh oh!

YouJiacheng Apr 7, 2022

Uh oh!

maobenz Apr 8, 2022 Author

Uh oh!

Uh oh!

YouJiacheng Apr 8, 2022

Uh oh!

YouJiacheng Apr 8, 2022

Uh oh!

ayaka14732 Apr 8, 2022

Uh oh!

patrick-kidger Apr 8, 2022

maobenz
Apr 7, 2022

Replies: 4 comments 4 replies

YouJiacheng
Apr 7, 2022

maobenz Apr 7, 2022
Author

maobenz Apr 8, 2022
Author

YouJiacheng
Apr 8, 2022

ayaka14732
Apr 8, 2022

patrick-kidger
Apr 8, 2022