You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add utils.set_determinism for reproducibility (#576)
This PR:
1 - adds a set_determinism function to set seed for Python, PyTorch,
CUDA and deterministic settings for cudnn.
2 - if seed is None, then no deterministic settings are used. This may
be important as turning off cuDnn benchmarking to ensure determinism,
can also negatively impact perf.
3 - note that for the None case, we revert / ensure cudnn is set back to
non-deterministic and benchmarking/tuning in case people are toggling.
This lack of determinism negatively impacted work with AWS where we
ended up with variations in loss curves while running fp8 for our joint
blog that appeared to be from fp8 but are instead likely due to not
having determinism in titan.
Testing - I ran multiple small runs with 7B while rotating between three
seeds and saw consistent ending loss points matching to the respective
seeds.
This PR does not set per worker aspects for the dataloader since we do
not shuffle atm...but that could become a future source of randomness
that will need to be set if we shuffle in the future.
0 commit comments