Skip to content

Commit 0e86a49

Browse files
authored
Merge pull request #18 from SCOREC/fix-overfitting
Fixed Overfitting with Dropout Rate, Weight Decay, and on the fly data augmentation.
2 parents 3635d7b + 5f30b94 commit 0e86a49

File tree

2 files changed

+201
-25
lines changed

2 files changed

+201
-25
lines changed

README.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,12 @@ run it
8585
The classifier supports several command line options for training configuration:
8686

8787
### Training Parameters
88-
- `--learningRate`: Learning rate for training (default: 1e-4)
89-
- `--batchSize`: Batch size for training (default: 8)
90-
- `--epochs`: Number of training epochs (default: 100)
91-
- `--minTrainingLoss`: Minimum reduction in training loss in orders of magnitude (default: 2, set to 0 to disable check)
88+
- `--learningRate`: Learning rate for training (default: 1e-5)
89+
- `--weightDecay`: Weight decay for L2 regularization (default: 5e-4)
90+
- `--dropoutRate`: Dropout rate for regularization (default: 0.3)
91+
- `--batchSize`: Batch size for training (default: 1)
92+
- `--epochs`: Number of training epochs (default: 2000)
93+
- `--minTrainingLoss`: Minimum reduction in training loss in orders of magnitude (default: 3, set to 0 to disable check)
9294

9395
### Data Configuration
9496
- `--trainFrameFirst`: First frame number for training data (default: 1)
@@ -102,11 +104,13 @@ The classifier supports several command line options for training configuration:
102104
- `--use-amp`: Enable automatic mixed precision training for faster training on modern GPUs
103105
- `--amp-dtype`: Data type for mixed precision (`float16` or `bfloat16`, default: `bfloat16`)
104106
- `--patience`: Patience for early stopping (default: 15 epochs)
107+
- `--seed`: Random seed for reproducibility (default: None for non-deterministic)
108+
- `--require-gpu`: Require GPU to be available, exit if not found
105109

106110
### Output and Monitoring
107111
- `--plot`: Enable creation of figures showing ground truth and model-identified X-points
108112
- `--plotDir`: Directory where figures are written (default: `./plots`)
109-
- `--checkPointFrequency`: Number of epochs between model checkpoints (default: 10)
113+
- `--checkPointFrequency`: Number of epochs between model checkpoints (default: 100)
110114

111115
### Performance Benchmarking
112116
- `--benchmark`: Enable performance benchmarking (tracks timing, throughput, GPU memory)
@@ -118,18 +122,21 @@ The classifier supports several command line options for training configuration:
118122

119123
### Example with Advanced Options
120124

121-
For faster training with mixed precision and early stopping:
122-
125+
For training with custom regularization and reproducibility:
123126
```bash
124127
python -u ${rcRoot}/reconClassifier/XPointMLTest.py \
125128
--paramFile=/path/to/params.txt \
126129
--xptCacheDir=/path/to/cache \
127130
--epochs 200 \
128131
--learningRate 1e-4 \
132+
--weightDecay 1e-3 \
133+
--dropoutRate 0.3 \
129134
--batchSize 16 \
130135
--use-amp \
131136
--amp-dtype bfloat16 \
132137
--patience 20 \
138+
--seed 42 \
139+
--require-gpu \
133140
--plot \
134141
--trainFrameLast 100 \
135142
--validationFrameLast 120

0 commit comments

Comments
 (0)