This repository stitches together different components of the analysis workflow, processing the NanAOD samples output from the r3k-preselBDT repository, applying analysis selections, incorporating MC scale factors, preparing files for the final fits in r3k-fitter, and building an acceptance x efficiency table for the signal and control regions, used in the final measurement. It also includes the machinery to perform cross-check ratio calculations using these efficiencies and fitted yields.
Multiple installation methods are possible for this codebase. Essentially just ROOT and the python packages listed below are needed, as well as an environment capable of running them inside a Jupyter notebook. One can install a CMSSW distribution (CMSSW_15_0_1 recommended) or build a standalone environment with ROOT. Using an LCG view is most likely the easiest/quickest way to access all these tools.
Ensure you have a Python environment (e.g., LCG View or Conda) with the following libraries:
pip install pandas numpy matplotlib seaborn uncertainties pyyaml
Note: ROOT is required for the preselection scripts but not strictly for the plotting/tables modules.
.
├── config/
│ ├── cuts.yml # Definition of preselection and BDT cuts
│ ├── samples.yml # File paths and metadata for MC/Data samples
│ └── xcheck_params.yml # Constants (Yields/Effs/PDG) for physics cross-checks
├── data/ # Local data storage (will be produced when the scripts are run)
│ ├── logs/ # CSV Cutflow logs generated by scripts/
│ └── <outputs>/ # Local ROOT files output by processing steps (can be useful to store these on EOS)
├── notebooks/
│ └── tables_maker.ipynb # Juptyer notebook for acceptance x efficiency tables
├── scripts/
│ ├── 01_apply_preselection.py # Step 1: Flat skimming & Preselection
│ ├── 02_apply_bdt_selection.py # Step 2: BDT Inference & Categorization
│ └── rk_xchecks.py # Step 3: R(K) Cross-checks & Sensitivity
└── README.md
The analysis follows a sequential workflow. Logs produced in Steps 1 & 2 are consumed by Step 3 & 4.
Script: scripts/01_apply_preselection.py
Filters raw samples based on trigger paths, muon/electron validity, and kinematic preselection (defined in config/cuts.yml).
- Input: Central MiniAOD/NanoAOD samples.
- Output: Skimmed ROOT files (
data/test_output_...) and Cutflow Log (data/logs/cutflow_step1_*.csv).
Script: scripts/02_apply_bdt_selection.py
Applies the XGBoost discriminator to separate Signal from Background. Events passing the BDT score cut are categorized into analysis bins.
- Input: Skimmed ROOT files from Step 1, after BDT inference has been run on them.
- Output: Final Selection ROOT files and BDT Efficiency Log (
data/logs/cutflow_step2_*.csv).
In order to run this step, the analysis BDT must be run on the skimmed samples from step 1, producing a bdt_score branch in the output ntuples. To ensure the Step 2 script (02_apply_bdt_selection.py) can successfully locate and process the files containing BDT scores, you must organize the input ROOT files using the following naming convention and directory structure.
Most importantly, the _wScores file suffix will identify the ntuples that have the correct BDT information. You can place these files in the existing region-specific sub-directories or in their own _wScores region directory, as long as the files contain the suffix. The script looks for ROOT files in the directory structure defined by your TRIGGER_TAG. Ensure your files are placed here before running the script:
data/
└── test_output_{TRIGGER_TAG}/ <-- e.g., data/test_output_mix/
├── jpsi/
│ └── {SampleName}_flat_skimmed_{TRIGGER_TAG}.root
│ └── {SampleName}_flat_skimmed_{TRIGGER_TAG}_wScores.root <-- This file will be used as the Step 2 input
└── jpsi_wScores/ <-- Optionally the script will look for the BDT files here as well
└── {SampleName}_flat_skimmed_{TRIGGER_TAG}_wScores.root
If the files are missing or in the wrong subfolders (e.g., all in one folder instead of split by jpsi/lowq2), the script will fail to categorize the events correctly.
File: notebooks/tables_maker.ipynb
A Jupyter notebook that merges logs from Step 1 and Step 2 to produce formatted acceptance tables.
- Cutflow Visualization: Log-scale bar charts showing event survival rates.
- Physics Normalization: Injects
N_miniaodandFONLLweights to calculate absolute acceptance. - Systematic Tables: Formats the final Acceptance Efficiency table with robust uncertainty propagation (using
uncertaintiespackage). - Region Matching: Automatically filters duplicate entries to match samples to their correct kinematic region (Low-, , ).
Usage:
Open in Jupyter, configure the TRIGGER_TAG (e.g., 'mix'), and run all cells.
File: scripts/rk_xchecks.py
A command-line tool for rapid validation of physics results against Standard Model expectations.
- R(K) Calculation: Computes ratios and Double Ratios () to cancel systematics.
- Statistical Tension: Calculates "Sigma" pulls to quantify agreement with PDG values.
- Era Comparison: Cross-checks Run 2 (2018) data against Run 3 (2022) for stability.
Ensure config/xcheck_params.yml is updated with your latest yields and efficiencies:
pdg:
br_b_to_jpsik: {value: 1.020e-3, error: 0.019e-3}
# ...
channels:
mumu:
2022:
yields: {n_jpsi: {value: 3921547, error: 1641}}
effs: {eff_jpsi: {value: 0.00453, error: 0.00001}}
Usage:
python scripts/rk_xchecks.py -c config/xcheck_params.yml
Example Output:
================================================================================
CROSS-CHECK RESULTS
================================================================================
--- 2022 Era (Muon + Electron) ---
R(J/ψ) [Control] : 1.0312 ± 0.0450 (Pull from 1.0: 0.7σ)
Double Ratio : 0.9850 ± 0.0620 (Pull from 1.0: 0.2σ)
...
================================================================================