You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SUMMARY:
- When updating the training args, `place_model_on_device` was missed
and as a result, when creating the trainer (which we really should not
be doing during oneshot...) the default value is left as True and the
trainer tries to move the model to a gpu, if it is available.
- We want this argument to be False as we handle the device map and
model initialization based on the calibration needs
TEST PLAN:
- `cpu_offloading_fp8.py` ran to completion without issue
- `mult_gpus_int8_device_map` made it past the error and is running
0 commit comments