Commit f38cd6f
Add remote training status monitoring script
Created check_remote_status.sh and supporting Python script to monitor
training progress on remote GPU servers.
Features:
- Connects to remote server using saved credentials
- Reports completion status for all model variants (baseline, content, function, POS)
- Shows mean ± std of final training losses for completed models
- Displays current epoch, loss, and estimated time to completion for in-progress models
- Accurate ETA calculation based on actual runtime from training log timestamps
Implementation:
- Bash wrapper (check_remote_status.sh) handles SSH connection and cluster selection
- Python analyzer (check_training_status.py) parses model directories and loss logs
- Extracts training start time from log files for accurate elapsed time calculation
- Calculates per-epoch training time and estimates remaining duration
- Supports both local and remote execution
Documentation:
- Added comprehensive usage instructions to README.md
- Example output showing completed and in-progress model statistics
- Integrated with existing remote_train.sh credential system
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>1 parent b8825c8 commit f38cd6f
File tree
3 files changed
+768
-0
lines changed- code
3 files changed
+768
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
412 | 413 | | |
413 | 414 | | |
414 | 415 | | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
415 | 467 | | |
416 | 468 | | |
417 | 469 | | |
| |||
0 commit comments