|
| 1 | +--- |
| 2 | +title: Background Enrichment augmented Anomaly Detection (BEAD) for new physics searches at LHC |
| 3 | +layout: gsoc_proposal |
| 4 | +project: BEAD |
| 5 | +year: 2025 |
| 6 | +organization: |
| 7 | + - SMARTHEP |
| 8 | + - UManchester |
| 9 | +difficulty: medium |
| 10 | +duration: 350 |
| 11 | +mentor_avail: June-August |
| 12 | +--- |
| 13 | + |
| 14 | +## Short description of the project |
| 15 | +A long-standing mystery of fundamental physics is the existence of dark matter (DM), |
| 16 | +a type of matter that has little interaction with ordinary matter but is supported by |
| 17 | +various astrophysical and cosmological observations and is six times more abundant |
| 18 | +than ordinary matter in the universe. |
| 19 | +Several Large Hadron Collider (LHC) experiments are conducting |
| 20 | +searches aimed at detecting dark matter. Unsupervised and semi-supervised |
| 21 | +learning outlier detection techniques are advantageous to these searches, |
| 22 | +for casting a wide net on a variety of possibilities for how dark |
| 23 | +matter manifests, as they impose minimal constraints from specific physics |
| 24 | +model details, but rather learn to separate characteristics of rare signals |
| 25 | +starting from the knowledge of the background they’ve been trained on. |
| 26 | +Developing innovative search techniques for probing dark matter signatures |
| 27 | +is crucial for broadening the DM search program at the LHC, and BEAD |
| 28 | +is a Python package that uses deep learning based methods for anomaly detection |
| 29 | +in HEP data for such new physics searches. BEAD has been designed with modularity in |
| 30 | +mind, to enable usage of various unsupervised latent variable models for any task. |
| 31 | + |
| 32 | +BEAD has five main running modes: |
| 33 | + |
| 34 | + 1. Data handling: Deals with handling file types, conversions between them and |
| 35 | +pre-processing the data to feed as inputs to the DL models. |
| 36 | + |
| 37 | + 2. Training: Trains a model to learn implicit representations of |
| 38 | +the background data that may come from multiple sources(/generators) |
| 39 | +to get a single, encriched latent representation of it. |
| 40 | + |
| 41 | + 3. Inference: Using a model trained on an enriched background, the user can |
| 42 | +feed in samples where to detect anomalies in. |
| 43 | + |
| 44 | + 4. Plotting: After running Inference, or Training, one can generate plots. |
| 45 | +These include performance plots as well as different visualizations of the learned data. |
| 46 | + |
| 47 | + 5. Diagnostics: Enabling this mode allows running profilers that measure |
| 48 | +a host of metrics connected to the usage of the compute node to |
| 49 | +help optimization of the code (using CPU-GPU metrics). |
| 50 | + |
| 51 | +The package is under active development. |
| 52 | +The student in this project will work on the machine learning models available |
| 53 | +in BEAD, and implementing new models to perform anomaly detection, initially on simulated data. |
| 54 | + |
| 55 | +## Task ideas |
| 56 | + |
| 57 | +Possible projects include: |
| 58 | + |
| 59 | + * New auto-encoder models could be developed, |
| 60 | +better identifying correlations between data objects in a given particle physics dataset entry |
| 61 | +(containing event level and/or physics object level information). |
| 62 | +New models could also improve performance on live / unseen data. |
| 63 | +These could include transformer, GNN, probabilistic and other tiypes of networks. |
| 64 | + * Existing models could be tested on different datasets, |
| 65 | +potentially identifying distinct latent spaces populated by the different |
| 66 | +LHC physics processes, that can enable improved anomaly detection. |
| 67 | + |
| 68 | +Ideas from the student working on this project are also welcome. |
| 69 | + |
| 70 | +## Expected results |
| 71 | + |
| 72 | +An improved performance of selected models, with documentation and figures of merit that may include: |
| 73 | + * Plots made in matplotlib that demonstrate the performance of the new models compared to the old |
| 74 | + * Documentation of the design choices made for the improved models |
| 75 | + * Documented evaluation of a physics analysis on data before and after compression |
| 76 | + |
| 77 | +## Requirements |
| 78 | + |
| 79 | + * Python |
| 80 | + * Linux environment |
| 81 | + * ML / unsupervised algorithms key concepts |
| 82 | + * PyTorch |
| 83 | + |
| 84 | + * Desired skills: transformers and/or graph neural networks, particle physics theory and experiments, particle physics simulations |
| 85 | + |
| 86 | + |
| 87 | +## Mentors |
| 88 | + * ***[Pratik Jawahar ](mailto:[email protected])*** |
| 89 | + * ***[Sukanya Sinha ](mailto:[email protected])*** |
| 90 | + * [Caterina Doglioni ](mailto:[email protected]) as backup mentor |
| 91 | + |
| 92 | +## Links |
| 93 | + * [Paper on unsupervised ML algorithms using HEP datasets](<https://arxiv.org/abs/2105.14027>) |
| 94 | + * [Review of LHC searches using unsupervised learning](<https://arxiv.org/abs/2312.14190>) |
| 95 | + * [BEAD GitHub repository (WIP)](<https://github.com/PRAkTIKal24/BEAD>) |
| 96 | + * [ROOT](<https://root.cern/>) |
| 97 | + * [Jupyter](<http://jupyter.org>) |
| 98 | + * [PyTorch](http://pytorch.org) |
0 commit comments