Skip to content

Commit 2c0277a

Browse files
committed
Updating README to identify what Model Evaluation Metrics mean
1 parent 426184e commit 2c0277a

File tree

1 file changed

+53
-1
lines changed

1 file changed

+53
-1
lines changed

README.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,11 @@ The classifier supports several command line options for training configuration:
108108
- `--plotDir`: Directory where figures are written (default: `./plots`)
109109
- `--checkPointFrequency`: Number of epochs between model checkpoints (default: 10)
110110

111+
### Performance Benchmarking
112+
- `--benchmark`: Enable performance benchmarking (tracks timing, throughput, GPU memory)
113+
- `--benchmark-output`: Path to save benchmark results JSON file (default: `./benchmark_results.json`)
114+
- `--eval-output`: Path to save evaluation metrics JSON file (default: `./evaluation_metrics.json`)
115+
111116
### Testing
112117
- `--smoke-test`: Run minimal smoke test for CI (overrides other parameters for quick validation)
113118

@@ -138,4 +143,51 @@ The following commands should be run on `checkers` **every time you create a new
138143
cd nsfCssiMlClassifier
139144
source envPyTorch.sh
140145
source pgkyl/bin/activate
141-
```
146+
```
147+
148+
## Model Evaluation Metrics
149+
150+
The model evaluation system measures how well the classifier identifies X-points (magnetic reconnection sites) by treating it as a pixel-level binary classification problem.
151+
152+
### Key Metrics
153+
154+
The evaluation outputs several metrics saved to JSON files:
155+
156+
- **Accuracy**: Overall pixel classification correctness (can be misleading due to class imbalance)
157+
- **Precision**: Fraction of detected X-points that are correct (measures false alarm rate)
158+
- **Recall**: Fraction of actual X-points that were found (measures miss rate)
159+
- **F1 Score**: Harmonic mean of precision and recall (balanced performance metric)
160+
- **IoU**: Intersection over Union - spatial overlap quality between predicted and actual X-point regions
161+
162+
### Understanding the Results
163+
164+
**Good performance indicators:**
165+
- F1 Score > 0.8
166+
- IoU > 0.5
167+
- Similar metrics between training and validation sets (no overfitting)
168+
- Low standard deviation across frames (consistent performance)
169+
170+
**Warning signs:**
171+
- Large gap between training and validation metrics (overfitting)
172+
- High precision but low recall (too conservative, missing X-points)
173+
- Low precision but high recall (too aggressive, many false alarms)
174+
- High frame-to-frame variation (inconsistent detection)
175+
176+
### Output Files
177+
178+
After training, the model produces:
179+
- `evaluation_metrics.json`: Validation set performance
180+
- `train_evaluation_metrics.json`: Training set performance
181+
- Performance plots in the `plots/` directory showing:
182+
- Training history (loss curves)
183+
- Model predictions vs ground truth
184+
- True positives (green), false positives (red), false negatives (yellow)
185+
186+
### Physics Context
187+
188+
For reconnection studies:
189+
- **High recall is critical**: Missing X-points means missing reconnection events
190+
- **Precision affects analysis**: False positives corrupt downstream calculations
191+
- **IoU indicates localization**: Poor IoU means inaccurate X-point positions
192+
193+
The model uses a 9×9 pixel expansion around X-points to account for localization uncertainty while still requiring accurate region identification.

0 commit comments

Comments
 (0)