Optimizer State Sharding #386
Unanswered
Sanger2000
asked this question in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It seems that there is currently no open source implementation of optimizer state sharding (ZeRO) in jax. This would be a great addition that greatly simplifies training large models using Adam or Adamw.
Beta Was this translation helpful? Give feedback.
All reactions