Commit 5e8f244
Fix
The test `test_train` previously asserted on the presence of the "Sanity
Checking" message in stdout. This was brittle because in
multi-GPU/DistributedDataParallel runs, **only rank 0 prints this
message**, so tests running on other ranks failed.
This PR updates the test to:
- Remove the fragile stdout assertion.
- Assert trainer state (`!trainer.sanity_checking`, `current_epoch >=
0`).
- Use LoggerCallback to verify that both training and validation ran.
This makes the test deterministic and robust across single-GPU,
multi-GPU, and CI environments.
[PassingLog.TXT](https://github.com/user-attachments/files/22671419/PassingLog.TXT)
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Rishi Puri <[email protected]>test_train to not rely on 'Sanity Checking' stdout in multi-GPU runs (#10478)1 parent 1252027 commit 5e8f244
1 file changed
+23
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
197 | 212 | | |
198 | | - | |
199 | | - | |
| 213 | + | |
| 214 | + | |
200 | 215 | | |
201 | 216 | | |
202 | 217 | | |
203 | | - | |
204 | | - | |
205 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
0 commit comments