Updated: October 9, 2025
- ✅ Progress bars with tqdm (visual training progress)
- ✅ Better timestamps (readable format:
2025-10-09 10:15:47instead of ISO) - ✅ Ratio display (current/total batches and epochs)
- ✅ ETA (estimated time remaining)
- ✅ Tree-style summaries for better readability
================================================================================
🤖 LEROBOT ACT TRAINER FOR RC CAR
================================================================================
📁 Data Directory: src/robots/rover/episodes
🔍 Scanning for episodes...
Scanning episodes: 10/88 (11%)
Scanning episodes: 20/88 (22%)
...
✅ Successfully loaded 84 valid episodes
================================================================================
🚀 STARTING TRAINING
================================================================================
📊 Training samples: 9937
📊 Validation samples: 2484
🎯 Total epochs: 30
📦 Batch size: 16
💾 Output directory: outputs/lerobot_act/lerobot_act_20251009_103015
================================================================================
================================================================================
🏃 EPOCH 1/30
================================================================================
Epoch 1/30: 50%|████████████ | 350/701 [02:15<02:15, 2.60it/s] Loss: 0.4523 (LR: 1.00e-04)
What it shows:
Epoch 1/30- Current epoch / total epochs50%- Percentage complete████████████- Visual progress bar350/701- Current batch / total batches[02:15<02:15]- Elapsed time < Remaining time2.60it/s- Iterations (batches) per secondLoss: 0.4523- Current average lossLR: 1.00e-04- Current learning rate
Validating: 100%|█████████████████████████| 155/155 [00:15<00:00, 10.2it/s] Loss: 0.0561
🌟 New best model! Val loss: 0.056080
📊 Epoch 1/30 Summary:
├─ Train Loss: 1.427284
├─ Val Loss: 0.056080
├─ Best Loss: 0.056080
├─ LR: 5.05e-05
└─ Time: 5m 17s
Tree-style layout:
├─= intermediate items└─= last item- Easy to scan visually
================================================================================
🎉 TRAINING COMPLETED!
================================================================================
📊 Best validation loss: 0.045123
⏱️ Total training time: 2h 45m 32s
📁 Models saved in: outputs/lerobot_act/lerobot_act_20251009_103015
================================================================================
Old format:
step,epoch,batch_loss,learning_rate,timestamp
0,0,95.94695,0.0001,2025-10-09T10:15:47.602829New format:
step,epoch,batch_loss,learning_rate,timestamp
0,0,95.94695,0.0001,2025-10-09 10:15:47✅ Much easier to read!
Old format:
epoch,train_loss,val_loss,best_val_loss,learning_rate,epoch_time,total_samples,timestamp
0,1.427,0.056,0.056,5.05e-05,317.65,12421,2025-10-09T10:21:28.149100New format:
epoch,train_loss,val_loss,best_val_loss,learning_rate,epoch_time,total_samples,timestamp
0,1.427,0.056,0.056,5.05e-05,317.65,12421,2025-10-09 10:21:28✅ Human-readable timestamps!
Step 0: Loss = 95.95 ← Random initialization
Step 10: Loss = 6.57 ← 93% drop in 10 steps!
Step 100: Loss = 2.06 ← 98% drop
Step 500: Loss = 0.55 ← 99.4% drop
Step 770: Loss = 0.24 ← Epoch 0 complete
Epoch 0:
- Train Loss: 1.427 (average)
- Val Loss: 0.056 (excellent!)
- Time: 5m 17s
- LR: 1e-4 → 5e-5 (scheduler reduced)
Epoch 1:
- Train Loss: 0.175 (8x better!)
- Val Loss: 0.060 (slight increase, normal)
- Time: 5m 16s
- LR: 5e-5 → 1e-6 (too aggressive!)
-
Model learns FAST 🚀
- 99.7% loss reduction in epoch 0
- Already converging well
-
Not overfitting ✅
- Val loss (0.056) very close to train loss
- Good generalization
-
Learning rate issue
⚠️ - Dropped from 1e-4 to 1e-6 in 2 epochs
- Too aggressive decay
- Should tune scheduler
┌─────────────────────────────────┬─────────────────────────────────┐
│ LEFT: Training Progress Bar │ TOP-RIGHT: GPU Monitor │
│ │ │
│ Epoch 12/30: 65%|████▌ | 456/│ GPU: RTX 3070 │
│ 701 [03:12<01:45] Loss: 0.123 │ Memory: 4350MB / 8188MB (53%) │
│ │ Utilization: 98% │
│ Validating: 100%|███████| 155/ │ Temp: 72°C │
│ 155 Loss: 0.098 │ Power: 150W │
│ │ │
│ 📊 Epoch 12/30 Summary: ├─────────────────────────────────┤
│ ├─ Train Loss: 0.123 │ BOTTOM-RIGHT: CSV Logs │
│ ├─ Val Loss: 0.098 │ │
│ ├─ Best Loss: 0.095 │ step,epoch,loss,lr,timestamp │
│ ├─ LR: 3.21e-05 │ 8400,12,0.123,3.21e-05,10:15:47│
│ └─ Time: 2m 45s │ 8410,12,0.121,3.21e-05,10:15:51│
│ │ 8420,12,0.125,3.21e-05,10:15:55│
└─────────────────────────────────┴─────────────────────────────────┘
LEFT (Training):
- ✅ Real-time progress bar
- ✅ Current/total batches
- ✅ Time elapsed & remaining
- ✅ Live loss updates
- ✅ Epoch summaries
TOP-RIGHT (GPU):
- ✅ GPU name & stats
- ✅ Memory usage %
- ✅ GPU utilization %
- ✅ Temperature
- ✅ Power draw
BOTTOM-RIGHT (Logs):
- ✅ CSV being written live
- ✅ Step-by-step metrics
- ✅ Readable timestamps
2.60it/s = 2.6 batches/second
→ 701 batches ÷ 2.6 = ~270 seconds = 4.5 minutes per epoch
Progress bar shows: [02:15<02:15]
↑elapsed ↑remaining
Total epoch time = 4m 30s
Watch the loss in the progress bar:
- Decreasing steadily: ✅ Good
- Fluctuating wildly:
⚠️ LR might be too high - Stuck/flat:
⚠️ LR might be too low or converged
- 98%: ✅ Perfect, GPU-bound training
- <50%:
⚠️ CPU bottleneck (data loading slow) - Memory near max:
⚠️ Risk of OOM
Problem: Running in environment without tty Solution: Training still works, just no progress bar. Logs still appear.
Problem: Old CSV files from previous runs Solution: New runs will use new format. Old files unchanged.
Problem: Terminal size or TERM variable Solution:
export TERM=xterm-256color
# Or adjust ncols in tqdm: ncols=80Solution:
# Set environment variable:
export TQDM_DISABLE=1
python training_script.py ...Your new training output includes:
✅ Visual progress bars - See training progress in real-time
✅ Readable timestamps - 2025-10-09 10:15:47 format
✅ Batch/Epoch ratios - 350/701 batches, 12/30 epochs
✅ ETAs - Know when training will finish
✅ Tree summaries - Easy to scan epoch results
✅ Time formatting - 2h 45m 32s instead of 9932.5s
Much better training experience! 🚀