Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,9 +203,11 @@ eventdisplay-ml-plot-classification-gamma-efficiency --help

**Model Architecture**:
- **Stereo reconstruction**: Multi-output regression (XGBoost)
- Targets: `[MCxoff, MCyoff, MCe0]` (x offset, y offset, log energy)
- Targets: `[Xoff_residual, Yoff_residual, E_residual]` (residuals relative to DispBDT)
- Residuals computed as: MC truth - DispBDT prediction
- During inference: final prediction = DispBDT baseline + predicted residual
- Single model handles all telescope multiplicities (2-4+ telescopes)
- Features: Telescope-level arrays + event-level parameters
- Features: Telescope-level arrays + event-level parameters (including DispBDT results)

- **Classification**: Binary classification (XGBoost)
- Target: Gamma vs hadron (implicit in training data split)
Expand Down
11 changes: 11 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
# Set update schedule for GitHub Actions

version: 2
updates:

- package-ecosystem: "github-actions"
directory: "/"
schedule:
# Check for updates to GitHub Actions every month
interval: "monthly"
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
repos:
# Ruff
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.14.10
rev: v0.15.9
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
# https://pycqa.github.io/isort/docs/configuration/black_compatibility.html#integration-with-pre-commit
- repo: https://github.com/pycqa/isort
rev: 7.0.0
rev: 8.0.1
hooks:
- id: isort
args: ["--profile", "black", "--filter-files"]
Expand All @@ -34,7 +34,7 @@ repos:
# - id: actionlint
# codespell
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
rev: v2.4.2
hooks:
- id: codespell
args: [
Expand Down
270 changes: 270 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,59 @@ Stereo analysis methods implemented in Eventdisplay provide direction / energies

Output is a single ROOT tree called `StereoAnalysis` with the same number of events as the input tree.

### Training Stereo Reconstruction Models

The stereo regression training pipeline uses multi-target XGBoost to predict residuals (deviations from baseline reconstructions):

**Targets:** `[Xoff_residual, Yoff_residual, E_residual]` (residuals on direction and energy as reconstruction by the BDT stereo reconstruction method)

**Key techniques:**

- **Target standardization:** Targets are mean-centered and scaled to unit variance during training
- **Energy-bin weighting:** Events are weighted inversely by energy bin density; bins with fewer than 10 events are excluded from training to prevent overfitting on low-statistics regions
- **Multiplicity weighting:** Higher-multiplicity events (more telescopes) receive higher sample weights to prioritize high-confidence reconstructions
- **Per-target SHAP importance:** Feature importance values computed during training for each target and cached for later analysis

**Training command:**

```bash
eventdisplay-ml-train-xgb-stereo \
--input_file_list train_files.txt \
--model_prefix models/stereo_model \
--max_events 100000 \
--train_test_fraction 0.5 \
--max_cores 8
```

**Output:** Joblib model file containing:

- XGBoost trained model object
- Target standardization scalers (mean/std)
- Feature list and SHAP importance rankings
- Training metadata (random state, hyperparameters)

### Applying Stereo Reconstruction Models

The apply pipeline loads trained models and makes predictions:

**Key safeguards:**

- Invalid energy values (≤0 or NaN) produce NaN outputs but preserve all input event rows
- Missing standardization parameters raise ValueError (prevents silent data corruption)
- Output row count always equals input row count

**Apply command:**

```bash
eventdisplay-ml-apply-xgb-stereo \
--input_file_list apply_files.txt \
--output_file_list output_files.txt \
--model_prefix models/stereo_model
```


**Output:** ROOT files with `StereoAnalysis` tree containing reconstructed Xoff, Yoff, and log10(E).

## Gamma/hadron separation using XGBoost

Gamma/hadron separation is performed using XGB Boost classification trees. Features are image parameters and stereo reconstruction parameters provided by Eventdisplay.
Expand All @@ -27,6 +80,223 @@ The zenith angle dependence is accounted for by including the zenith angle as a

Output is a single ROOT tree called `Classification` with the same number of events as the input tree. It contains the classification prediction (`Gamma_Prediction`) and boolean flags (e.g. `Is_Gamma_75` for 75% signal efficiency cut).

## Diagnostic Tools

The committed regression diagnostics in this branch are:

### SHAP feature-importance summary

Tests: Feature importance

- Load per-target SHAP importances cached in the trained model file
- Create one top-20 feature plot per residual target (`Xoff_residual`, `Yoff_residual`, `E_residual`)

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated PNGs

Run:

```bash
eventdisplay-ml-diagnostic-shap-summary \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/
```

Outputs:

- `diagnostics/shap_importance_Xoff_residual.png`
- `diagnostics/shap_importance_Yoff_residual.png`
- `diagnostics/shap_importance_E_residual.png`

### Permutation importance

- Rebuild the held-out test split from the model metadata and original input files
- Shuffle one feature at a time and measure the relative RMSE increase per residual target
- Validate predictive dependence on features rather than cached model attribution

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots
- `--top_n`: number of top features to include in the plot (optional)
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-permutation-importance \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/ \
--top_n 20
```

Optional override:

```bash
eventdisplay-ml-diagnostic-permutation-importance \
--model_file models/stereo_model.joblib \
--input_file_list files.txt \
--output_dir diagnostics/
```

Output:

- `diagnostics/permutation_importance.png`

Notes:

- This diagnostic is slower than the SHAP summary because it rebuilds the processed test split.
- It is the better choice when you want to measure actual performance sensitivity to each feature.

### Generalization gap

- Read the cached train/test RMSE summary written during training
- Compare final train and test RMSE for each residual target
- Quantify the overfitting gap after training is complete

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-generalization-gap \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/
```

Optional override:

```bash
eventdisplay-ml-diagnostic-generalization-gap \
--model_file models/stereo_model.joblib \
--input_file_list files.txt \
--output_dir diagnostics/
```

Output:

- `diagnostics/generalization_gap.png`

Notes:

- This diagnostic measures final overfitting by comparing train and test residual RMSE.
- Older model files without cached metrics fall back to rebuilding the original train/test split.
- Unlike `plot_training_evaluation.py`, it summarizes final RMSE, not the per-iteration XGBoost training history.

### Partial Dependence Plots

- Visualize how each feature influences model predictions
- Prove the model captures physics by checking that multiplicity reduces corrections and baselines show smooth relationships

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots (optional; default: `diagnostics`)
- `--features`: space-separated list of features to plot (optional; default: `DispNImages Xoff_weighted_bdt Yoff_weighted_bdt ErecS`)
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-partial-dependence \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/ \
--features DispNImages Xoff_weighted_bdt ErecS
```

Optional override:

```bash
eventdisplay-ml-diagnostic-partial-dependence \
--model_file models/stereo_model.joblib \
--input_file_list files.txt \
--features Xoff_weighted_bdt Yoff_weighted_bdt
```

Output:

- `diagnostics/partial_dependence.png` (grid of feature × target subplots)

Notes:

- PDP displays predicted residual output as a function of a single feature while holding others constant
- Multiplicity effect: high-multiplicity events should show smaller corrections (negative slope)
- Baseline stability: baseline features (e.g., `weighted_bdt`) should show smooth, linear relationships
- This diagnostic rebuilds the held-out test split and is slower than SHAP summary

### Residual Normality Diagnostics

- Validate that model residuals follow a normal distribution
- Detect outlier events and check for systematic biases in reconstruction errors

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots (optional; default: `diagnostics`)
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-residual-normality \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/
```

Optional override:

```bash
eventdisplay-ml-diagnostic-residual-normality \
--model_file models/stereo_model.joblib \
--input_file_list files.txt
```

Output:

- Residual normality statistics printed to console:
- Mean and standard deviation per target
- Kolmogorov-Smirnov test p-value (normality test)
- Anderson-Darling test statistic and critical value
- Skewness and kurtosis
- Q-Q plot R² value
- Number of outliers (>3σ) per target
- `diagnostics/residual_diagnostics.png` (single 2xN grid; generated on cache miss when reconstruction is required)

Notes:

- Residual normality stats are cached during training and loaded from the model file for fast retrieval
- Diagnostic plots (histograms, Q-Q plots) are only generated when the split must be reconstructed
- Invalid KS test or Anderson-Darling results (NaN/inf) are reported as special values
- Outlier counts help identify events with unusually large reconstruction errors

### Training-evaluation curves

- Plot XGBoost training vs validation metric curves
- Useful for checking convergence and overfitting behavior

Required inputs:

- `--model_file`: trained model `.joblib` containing an XGBoost model
- `--output_file`: output image path (optional; if omitted, plot is shown interactively)

Run:

```bash
eventdisplay-ml-plot-training-evaluation \
--model_file models/stereo_model.joblib \
--output_file diagnostics/training_curves.png
```

Output:

- Figure with one panel per tracked metric (for example `rmse`), showing training and test curves.

## Generative AI disclosure

Generative AI tools (including Claude, ChatGPT, and Gemini) were used to assist with code development, debugging, and documentation drafting. All AI-assisted outputs were reviewed, validated, and, where necessary, modified by the authors to ensure accuracy and reliability.
Expand Down
23 changes: 23 additions & 0 deletions docs/changes/53.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
* **Algorithm improvements**

* Switch to residual learning (predict corrections to baseline reconstructions)
* Add target standardization for balanced multi-target training
* Introduce energy-bin weighting with low-statistics suppression
* Refine XGBoost training (regularization, early stopping, updated hyperparameters)

* **New features**

* Training diagnostics with cached metrics (generalization gap, residual normality)
* SHAP feature importance caching per target
* Diagnostic scripts and CLI tools for evaluation and interpretability
* Reproducible diagnostics via model metadata reconstruction
* Expanded test suite and improved error handling

* **Bug fixes**

* Correct log10 handling for energy residuals
* Fix scaler loading/inversion in apply pipeline
* Fix energy-bin weighting logic
* Ensure safe energy validation (ErecS) without dropping rows
* Align evaluation metrics with residual formulation
* Resolve pandas/sklearn warnings and compatibility issues
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,14 @@ urls."documentation" = "https://github.com/Eventdisplay/Eventdisplay-ML"
urls."repository" = "https://github.com/Eventdisplay/Eventdisplay-ML"
scripts.eventdisplay-ml-apply-xgb-classify = "eventdisplay_ml.scripts.apply_xgb_classify:main"
scripts.eventdisplay-ml-apply-xgb-stereo = "eventdisplay_ml.scripts.apply_xgb_stereo:main"
scripts.eventdisplay-ml-diagnostic-generalization-gap = "eventdisplay_ml.scripts.diagnostic_generalization_gap:main"
scripts.eventdisplay-ml-diagnostic-partial-dependence = "eventdisplay_ml.scripts.diagnostic_partial_dependence:main"
scripts.eventdisplay-ml-diagnostic-permutation-importance = "eventdisplay_ml.scripts.diagnostic_permutation_importance:main"
scripts.eventdisplay-ml-diagnostic-residual-normality = "eventdisplay_ml.scripts.diagnostic_residual_normality:main"
scripts.eventdisplay-ml-diagnostic-shap-summary = "eventdisplay_ml.scripts.diagnostic_shap_summary:main"
scripts.eventdisplay-ml-plot-classification-performance-metrics = "eventdisplay_ml.scripts.plot_classification_performance_metrics:main"
scripts.eventdisplay-ml-plot-classification-gamma-efficiency = "eventdisplay_ml.scripts.plot_classification_gamma_efficiency:main"
scripts.eventdisplay-ml-plot-training-evaluation = "eventdisplay_ml.scripts.plot_training_evaluation:main"
scripts.eventdisplay-ml-train-xgb-classify = "eventdisplay_ml.scripts.train_xgb_classify:main"
scripts.eventdisplay-ml-train-xgb-stereo = "eventdisplay_ml.scripts.train_xgb_stereo:main"

Expand Down
3 changes: 3 additions & 0 deletions src/eventdisplay_ml/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,5 +239,8 @@ def configure_apply(analysis_type):
)
model_configs["energy_bins_log10_tev"] = par.get("energy_bins_log10_tev", [])
model_configs["zenith_bins_deg"] = par.get("zenith_bins_deg", [])
if analysis_type == "stereo_analysis":
model_configs["target_mean"] = par.get("target_mean")
model_configs["target_std"] = par.get("target_std")

return model_configs
Loading