Added BEAD proposal (#1694)

caterina-doglioni · web-flow · commit db4bf5f2c933 · 2025-02-13T09:45:48.000+01:00
* Create proposal_SMARTHEP_BEAD.md

* Update mentors.md for BEAD proposal

* Update proposal_SMARTHEP_BEAD.md

added link to WIP repo
diff --git a/_gsocproposals/2025/proposal_SMARTHEP_BEAD.md b/_gsocproposals/2025/proposal_SMARTHEP_BEAD.md
@@ -0,0 +1,98 @@
+---
+title: Background Enrichment augmented Anomaly Detection (BEAD) for new physics searches at LHC
+layout: gsoc_proposal
+project: BEAD
+year: 2025
+organization:
+  - SMARTHEP
+  - UManchester
+difficulty: medium
+duration: 350
+mentor_avail: June-August 
+---
+​
+## Short description of the project
+A long-standing mystery of fundamental physics is the existence of dark matter (DM), 
+a type of matter that has little interaction with ordinary matter but is supported by 
+various astrophysical and cosmological observations and is six times more abundant
+than ordinary matter in the universe. 
+Several Large Hadron Collider (LHC) experiments are conducting 
+searches aimed at detecting dark matter.  Unsupervised and semi-supervised 
+learning outlier detection techniques are advantageous to these searches, 
+for casting a wide net on a variety of possibilities for how dark 
+matter manifests, as they impose minimal constraints from specific physics 
+model details, but rather learn to separate characteristics of rare signals 
+starting from the knowledge of the background they’ve been trained on. 
+Developing innovative search techniques for probing dark matter signatures 
+is crucial for broadening the DM search program at the LHC, and BEAD 
+is a Python package that uses deep learning based methods for anomaly detection 
+in HEP data for such new physics searches. BEAD has been designed with modularity in 
+mind, to enable usage of various unsupervised latent variable models for any task.
+
+BEAD has five main running modes:
+
+   1. Data handling: Deals with handling file types, conversions between them and 
+pre-processing the data to feed as inputs to the DL models.
+
+   2. Training: Trains a model to learn implicit representations of 
+the background data that may come from multiple sources(/generators)
+to get a single, encriched latent representation of it.
+
+   3. Inference: Using a model trained on an enriched background, the user can
+feed in samples where to detect anomalies in.
+
+   4. Plotting: After running Inference, or Training, one can generate plots. 
+These include performance plots as well as different visualizations of the learned data.
+
+   5. Diagnostics: Enabling this mode allows running profilers that measure
+a host of metrics connected to the usage of the compute node to
+help optimization of the code (using CPU-GPU metrics).
+
+The package is under active development. 
+The student in this project will work on the machine learning models available 
+in BEAD, and implementing new models to perform anomaly detection, initially on simulated data.
+
+## Task ideas
+
+Possible projects include:
+
+  * New auto-encoder models could be developed, 
+better identifying correlations between data objects in a given particle physics dataset entry 
+(containing event level and/or physics object level information). 
+New models could also improve performance on live / unseen data. 
+These could include transformer, GNN, probabilistic and other tiypes of networks.
+  * Existing models could be tested on different datasets, 
+potentially identifying distinct latent spaces populated by the different 
+LHC physics processes, that can enable improved anomaly detection.
+
+Ideas from the student working on this project are also welcome.
+
+## Expected results
+
+An improved performance of selected models, with documentation and figures of merit that may include:
+  * Plots made in matplotlib that demonstrate the performance of the new models compared to the old
+  * Documentation of the design choices made for the improved models
+  * Documented evaluation of a physics analysis on data before and after compression
+
+## Requirements
+
+   * Python
+   * Linux environment
+   * ML / unsupervised algorithms key concepts 
+   * PyTorch
+
+   * Desired skills: transformers and/or graph neural networks, particle physics theory and experiments, particle physics simulations
+
+
+## Mentors
+  * ***[Pratik Jawahar](mailto:pratik.jawahar@cern.ch)***
+  * ***[Sukanya Sinha](mailto:sukanya.sinha@cern.ch)***
+  * [Caterina Doglioni](mailto:caterina.doglioni@cern.ch) as backup mentor
+
+## Links
+  * [Paper on unsupervised ML algorithms using HEP datasets](<https://arxiv.org/abs/2105.14027>)
+  * [Review of LHC searches using unsupervised learning](<https://arxiv.org/abs/2312.14190>)
+  * [BEAD GitHub repository (WIP)](<https://github.com/PRAkTIKal24/BEAD>)
+  * [ROOT](<https://root.cern/>)
+  * [Jupyter](<http://jupyter.org>)
+  * [PyTorch](http://pytorch.org)
diff --git a/gsoc/2025/mentors.md b/gsoc/2025/mentors.md
@@ -16,6 +16,7 @@ layout: plain
 * Mateusz Fila [mateusz.jakub.fila@cern.ch](mailto:mateusz.jakub.fila@cern.ch) CERN
 * Tobias Fitschen [tobias.fitschen@cern.ch](mailto:tobias.fitschen@cern.ch) UManchester
 * Chris Gutschow [chris.g@cern.ch](mailto:chris.g@cern.ch) UCLondon
+* Pratik Jawahar [pratik.jawahar@postgrad.manchester.ac.uk](mailto:pratik.jawahar@postgrad.manchester.ac.uk) UManchester
 * Aaron Jomy [aaron.jomy@cern.ch](mailto:aaron.jomy@cern.ch) CERN/CompRes
 * Stephan Lachnit [stephan.lachnit@desy.de](mailto:stephan.lachnit@desy.de) DESY
 * David Lange [david.lange@cern.ch](mailto:david.lange@cern.ch) CompRes
@@ -27,7 +28,8 @@ layout: plain
 * Felice Pantaleo [felice.pantaleo@cern.ch](mailto:felice.pantaleo@cern.ch) CERN
 * Giacomo Parolini [giacomo.parolini@cern.ch](mailto:giacomo.parolini@cern.ch) CERN
 * Alexander Penev [alexander.p.penev@gmail.com](mailto:alexander.p.penev@gmail.com) CompRes/University of Plovdiv, BG
-* Sanjiban Sengupta [sanjiban.sengupta@cern.ch](mailto:sanjiban.sengupta@cern.ch) CERN/UofManchester
+* Sukanya Sinha [sukanya.sinha@manchester.ac.uk](mailto:sukanya.sinha@manchester.ac.uk) UManchester
+* Sanjiban Sengupta [sanjiban.sengupta@cern.ch](mailto:sanjiban.sengupta@cern.ch) CERN/UManchester
 * James Smith [james.smith-7@manchester.ac.uk](mailto:james.smith-7@manchester.ac.uk) UManchester
 * Mayank Sharma [mayank.sharma@cern.ch](mailto:mayank.sharma@cern.ch) UMich
 * Simon Spannagel [simon.spannagel@desy.de](mailto:simon.spannagel@desy.de) DESY

-Original file line number
+Diff line change
 * Mateusz Fila [[email protected]](mailto:[email protected]) CERN
 * Tobias Fitschen [[email protected]](mailto:[email protected]) UManchester
 * Chris Gutschow [[email protected]](mailto:[email protected]) UCLondon
 +* Pratik Jawahar [[email protected]](mailto:[email protected]) UManchester
 * Aaron Jomy [[email protected]](mailto:[email protected]) CERN/CompRes
 * Stephan Lachnit [[email protected]](mailto:[email protected]) DESY
 * David Lange [[email protected]](mailto:[email protected]) CompRes
 * Felice Pantaleo [[email protected]](mailto:[email protected]) CERN
 * Giacomo Parolini [[email protected]](mailto:[email protected]) CERN
 * Alexander Penev [[email protected]](mailto:[email protected]) CompRes/University of Plovdiv, BG
 -* Sanjiban Sengupta [[email protected]](mailto:[email protected]) CERN/UofManchester
 +* Sukanya Sinha [[email protected]](mailto:[email protected]) UManchester
 +* Sanjiban Sengupta [[email protected]](mailto:[email protected]) CERN/UManchester
 * James Smith [[email protected]](mailto:[email protected]) UManchester
 * Mayank Sharma [[email protected]](mailto:[email protected]) UMich
 * Simon Spannagel [[email protected]](mailto:[email protected]) DESY