R(K) Analysis Integration Framework

This repository stitches together different components of the analysis workflow, processing the NanAOD samples output from the r3k-preselBDT repository, applying analysis selections, incorporating MC scale factors, preparing files for the final fits in r3k-fitter, and building an acceptance x efficiency table for the signal and control regions, used in the final measurement. It also includes the machinery to perform cross-check ratio calculations using these efficiencies and fitted yields.

📦 Setup & Dependencies

Multiple installation methods are possible for this codebase. Essentially just ROOT and the python packages listed below are needed, as well as an environment capable of running them inside a Jupyter notebook. One can install a CMSSW distribution (CMSSW_15_0_1 recommended) or build a standalone environment with ROOT. Using an LCG view is most likely the easiest/quickest way to access all these tools.

Ensure you have a Python environment (e.g., LCG View or Conda) with the following libraries:

pip install pandas numpy matplotlib seaborn uncertainties pyyaml

Note: ROOT is required for the preselection scripts but not strictly for the plotting/tables modules.

📂 Repository Structure

.
├── config/
│   ├── cuts.yml                   # Definition of preselection and BDT cuts
│   ├── samples.yml                # File paths and metadata for MC/Data samples
│   └── xcheck_params.yml          # Constants (Yields/Effs/PDG) for physics cross-checks
├── data/                          # Local data storage (will be produced when the scripts are run)
│   ├── logs/                      # CSV Cutflow logs generated by scripts/
│   └── <outputs>/                 # Local ROOT files output by processing steps (can be useful to store these on EOS)
├── notebooks/
│   └── tables_maker.ipynb         # Juptyer notebook for acceptance x efficiency tables
├── scripts/
│   ├── 01_apply_preselection.py   # Step 1: Flat skimming & Preselection
│   ├── 02_apply_bdt_selection.py  # Step 2: BDT Inference & Categorization
│   └── rk_xchecks.py              # Step 3: R(K) Cross-checks & Sensitivity
└── README.md

⚙️ Analysis Pipeline

The analysis follows a sequential workflow. Logs produced in Steps 1 & 2 are consumed by Step 3 & 4.

Step 1: Preselection & Skimming

Script: scripts/01_apply_preselection.py

Filters raw samples based on trigger paths, muon/electron validity, and kinematic preselection (defined in config/cuts.yml).

Input: Central MiniAOD/NanoAOD samples.
Output: Skimmed ROOT files (data/test_output_...) and Cutflow Log (data/logs/cutflow_step1_*.csv).

Step 2: BDT Inference

Script: scripts/02_apply_bdt_selection.py

Applies the XGBoost discriminator to separate Signal from Background. Events passing the BDT score cut are categorized into analysis bins.

Input: Skimmed ROOT files from Step 1, after BDT inference has been run on them.
Output: Final Selection ROOT files and BDT Efficiency Log (data/logs/cutflow_step2_*.csv).

📝 Note: Input Data Organization for BDT Selection

In order to run this step, the analysis BDT must be run on the skimmed samples from step 1, producing a bdt_score branch in the output ntuples. To ensure the Step 2 script (02_apply_bdt_selection.py) can successfully locate and process the files containing BDT scores, you must organize the input ROOT files using the following naming convention and directory structure.

Most importantly, the _wScores file suffix will identify the ntuples that have the correct BDT information. You can place these files in the existing region-specific sub-directories or in their own _wScores region directory, as long as the files contain the suffix. The script looks for ROOT files in the directory structure defined by your TRIGGER_TAG. Ensure your files are placed here before running the script:

data/
└── test_output_{TRIGGER_TAG}/   <-- e.g., data/test_output_mix/
    ├── jpsi/
    │   └── {SampleName}_flat_skimmed_{TRIGGER_TAG}.root
    │   └── {SampleName}_flat_skimmed_{TRIGGER_TAG}_wScores.root <-- This file will be used as the Step 2 input
    └── jpsi_wScores/ <-- Optionally the script will look for the BDT files here as well
        └── {SampleName}_flat_skimmed_{TRIGGER_TAG}_wScores.root

If the files are missing or in the wrong subfolders (e.g., all in one folder instead of split by jpsi/lowq2), the script will fail to categorize the events correctly.

📊 Module 3: Efficiency Tables & Plots

File: notebooks/tables_maker.ipynb

A Jupyter notebook that merges logs from Step 1 and Step 2 to produce formatted acceptance tables.

Features

Cutflow Visualization: Log-scale bar charts showing event survival rates.
Physics Normalization: Injects N_miniaod and FONLL weights to calculate absolute acceptance.
Systematic Tables: Formats the final Acceptance Efficiency table with robust uncertainty propagation (using uncertainties package).
Region Matching: Automatically filters duplicate entries to match samples to their correct kinematic region (Low-, , ).

Usage: Open in Jupyter, configure the TRIGGER_TAG (e.g., 'mix'), and run all cells.

🔬 Module 4: Physics Cross-Checks

File: scripts/rk_xchecks.py

A command-line tool for rapid validation of physics results against Standard Model expectations.

Features

R(K) Calculation: Computes ratios and Double Ratios () to cancel systematics.
Statistical Tension: Calculates "Sigma" pulls to quantify agreement with PDG values.
Era Comparison: Cross-checks Run 2 (2018) data against Run 3 (2022) for stability.

Configuration

Ensure config/xcheck_params.yml is updated with your latest yields and efficiencies:

pdg:
  br_b_to_jpsik: {value: 1.020e-3, error: 0.019e-3}
  # ...
channels:
  mumu:
    2022:
      yields: {n_jpsi: {value: 3921547, error: 1641}}
      effs:   {eff_jpsi: {value: 0.00453, error: 0.00001}}

Usage:

python scripts/rk_xchecks.py -c config/xcheck_params.yml

Example Output:

================================================================================
CROSS-CHECK RESULTS
================================================================================
--- 2022 Era (Muon + Electron) ---
R(J/ψ) [Control]                     : 1.0312 ± 0.0450      (Pull from 1.0: 0.7σ)
Double Ratio                         : 0.9850 ± 0.0620      (Pull from 1.0: 0.2σ)
...
================================================================================

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R(K) Analysis Integration Framework

📦 Setup & Dependencies

📂 Repository Structure

⚙️ Analysis Pipeline

Step 1: Preselection & Skimming

Step 2: BDT Inference

📝 Note: Input Data Organization for BDT Selection

📊 Module 3: Efficiency Tables & Plots

Features

🔬 Module 4: Physics Cross-Checks

Features

Configuration

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
notebooks		notebooks
scripts		scripts
README.md		README.md

DiElectronX/r3k-integration

Folders and files

Latest commit

History

Repository files navigation

R(K) Analysis Integration Framework

📦 Setup & Dependencies

📂 Repository Structure

⚙️ Analysis Pipeline

Step 1: Preselection & Skimming

Step 2: BDT Inference

📝 Note: Input Data Organization for BDT Selection

📊 Module 3: Efficiency Tables & Plots

Features

🔬 Module 4: Physics Cross-Checks

Features

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages