Skip to content

Commit 14ebfea

Browse files
Pavel Levinfacebook-github-bot
authored andcommitted
Fixing relative threshold mode early stopping working on DDP (#947)
Summary: Pull Request resolved: #947 `min_delta` needs to be moved to the same device as the metrics to allow multi-device training with relative threshold mode. Without it getting the following error: ``` RuntimeError:Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! ``` Reviewed By: JKSenthil Differential Revision: D65969422 fbshipit-source-id: b61b09b458c3851a2acb4a39a8ebf3e43bb15e8c
1 parent 8b4ab19 commit 14ebfea

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

torchtnt/utils/early_stop_checker.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ def check(self, val: Union[torch.Tensor, float, int]) -> bool:
180180
improvement_threshold = self.min_delta
181181
if self._threshold_mode == "rel":
182182
base_val = self._best_value if torch.isfinite(self._best_value) else 0.0
183-
improvement_threshold = self.min_delta * base_val
183+
improvement_threshold = self.min_delta.to(val.device) * base_val
184184

185185
improvement_threshold = improvement_threshold.to(val.device)
186186

0 commit comments

Comments
 (0)