You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`--benchmark-output`: Path to save benchmark results JSON file (default: `./benchmark_results.json`)
114
+
-`--eval-output`: Path to save evaluation metrics JSON file (default: `./evaluation_metrics.json`)
115
+
111
116
### Testing
112
117
-`--smoke-test`: Run minimal smoke test for CI (overrides other parameters for quick validation)
113
118
@@ -138,4 +143,51 @@ The following commands should be run on `checkers` **every time you create a new
138
143
cd nsfCssiMlClassifier
139
144
source envPyTorch.sh
140
145
source pgkyl/bin/activate
141
-
```
146
+
```
147
+
148
+
## Model Evaluation Metrics
149
+
150
+
The model evaluation system measures how well the classifier identifies X-points (magnetic reconnection sites) by treating it as a pixel-level binary classification problem.
151
+
152
+
### Key Metrics
153
+
154
+
The evaluation outputs several metrics saved to JSON files:
155
+
156
+
-**Accuracy**: Overall pixel classification correctness (can be misleading due to class imbalance)
157
+
-**Precision**: Fraction of detected X-points that are correct (measures false alarm rate)
158
+
-**Recall**: Fraction of actual X-points that were found (measures miss rate)
159
+
-**F1 Score**: Harmonic mean of precision and recall (balanced performance metric)
160
+
-**IoU**: Intersection over Union - spatial overlap quality between predicted and actual X-point regions
161
+
162
+
### Understanding the Results
163
+
164
+
**Good performance indicators:**
165
+
- F1 Score > 0.8
166
+
- IoU > 0.5
167
+
- Similar metrics between training and validation sets (no overfitting)
168
+
- Low standard deviation across frames (consistent performance)
169
+
170
+
**Warning signs:**
171
+
- Large gap between training and validation metrics (overfitting)
172
+
- High precision but low recall (too conservative, missing X-points)
173
+
- Low precision but high recall (too aggressive, many false alarms)
174
+
- High frame-to-frame variation (inconsistent detection)
175
+
176
+
### Output Files
177
+
178
+
After training, the model produces:
179
+
-`evaluation_metrics.json`: Validation set performance
180
+
-`train_evaluation_metrics.json`: Training set performance
181
+
- Performance plots in the `plots/` directory showing:
0 commit comments