Recognizer: Incorporate EMA (#922)

Victor Bourgin · facebook-github-bot · commit 6d0c0782471d · 2024-10-10T17:37:48.000-07:00
Summary:

Add EMA to the recognizer:
- Separate out learning rate scheduler updates and EMA model updates: in d2go, the EMA weights were updated every step, while the scheduler was updated every epoch. We separate them to implement the same functionality in Vizard and override `on_train_step_end` to update the EMA weights every step (irrespective of other parameters).
- Update torchtnt auto_unit to use self.device for the EMA / SWA model, which may be set from environment in the superclass init. This enables model evaluation in GPU.

Differential Revision: D64206735
diff --git a/torchtnt/framework/auto_unit.py b/torchtnt/framework/auto_unit.py
@@ -512,7 +512,7 @@ def __init__(
 
             self.swa_model = AveragedModel(
                 module_for_swa,
-                device=device,
+                device=self.device,
                 use_buffers=swa_params.use_buffers,
                 averaging_method=swa_params.averaging_method,
                 ema_decay=swa_params.ema_decay,