Skip to content

maplexgitx0302/NTUHEPML-CWoLa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning + CWoLa for VBF vs. GGF Classification

This project applies deep learning to distinguish between VBF and GGF Higgs production modes using the Classification Without Labels (CWoLa) framework. The approach is inspired by the paper Classification without labels: Learning from mixed samples in high energy physics, which introduces CWoLa as a viable strategy for learning directly from mixed real data samples.


Environment Setup

  1. Download miniconda through:

    # Assuming Linux system
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    
    # Install thorugh sh. Type 'yes' when asking automatically initialization.
    sh Miniconda3-latest-Linux-x86_64.sh
  2. Create a virtual environment:

    # Initialization
    conda env create -f environment.yml
    
    # Update when the environment.yml changed
    conda env update -f environment.yml
  3. Directly activate in Jupyter, or activate/exit with:

    conda activate cwola
    conda deactivate
  4. (Optional) Create a .env such that VSCode can fetch the packages.

    PYTHONPATH=~/miniconda3/envs/cwola/lib/python3.12/site-packages
    

Models

Convolutional Neural Networks (CNN)

Particle Transformers (ParT)

  • ParT_Baseline: A transformer-based architecture based on Particle Transformer for Jet Tagging. This model captures particle-level features using attention mechanisms tailored for jet tagging tasks.

  • ParT_*: A family of lighter variants derived from ParT_Baseline, offering faster training and inference with reduced computational cost.

Usage

Data Preprocessing & Augmentation

The data preprocessings can be implemented by the following steps:

  1. Check the supported methods:
    • data preprocessing: Check the methods provided in the class src.data_preprocess.MCSimData
    • data augmentation: Supported functions can be found in src.data_augment.
  2. Give abbreviations for the preprocessing/augmentation methods in the class LitDataModule. ./notebooks/training.ipynbLitDataModule__init__
  3. Determine which preprocessings/augmentations to be used through YAML files in ./config with file named exp_*.yml.

About

Applying "CWoLa" on simulated Higgs dataset with CNN and Particle Transformer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published