Replies: 4 comments 10 replies
-
Was wondering the same thing! Does anyone have any examples of proper usage of |
Beta Was this translation helpful? Give feedback.
-
What do you mean by proper usage? You could create some variant of adam by chaining rmsprop and ema for example, that probably can work well. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
This is my final implementation for an exponential moving average (ema) over parameters of my neural network: class State(NamedTuple):
params: PyTree
opt_state: OptState
grads = jax.grad(loss_fn)(state.params)
updates, new_opt_state = optimizer.update(grads, state.opt_state)
new_params = optax.incremental_update(
new_tensors = optax.apply_updates(state.params, updates),
old_tensors = state.params,
step_size = 0.999
) # exponential moving average btwn prev params and new params from optimizer update
new_state = LFISState(params=new_params, opt_state=new_opt_state) This avoids needing to separately track |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, everyone! I'm just training my model with
adamw
optimizer and exponential moving average, and I usedoptax.chain
to combine the gradient clip, the optimizer andoptax.ema
,However, when I started training my model, the loss seems to be hard to converge

Did I use
optax.ema
properly? The version of optax is0.1.3
.Beta Was this translation helpful? Give feedback.
All reactions