Skip to content

Commit db4bf5f

Browse files
Added BEAD proposal (#1694)
* Create proposal_SMARTHEP_BEAD.md * Update mentors.md for BEAD proposal * Update proposal_SMARTHEP_BEAD.md added link to WIP repo
1 parent e1564d9 commit db4bf5f

File tree

2 files changed

+101
-1
lines changed

2 files changed

+101
-1
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: Background Enrichment augmented Anomaly Detection (BEAD) for new physics searches at LHC
3+
layout: gsoc_proposal
4+
project: BEAD
5+
year: 2025
6+
organization:
7+
- SMARTHEP
8+
- UManchester
9+
difficulty: medium
10+
duration: 350
11+
mentor_avail: June-August
12+
---
13+
14+
## Short description of the project
15+
A long-standing mystery of fundamental physics is the existence of dark matter (DM),
16+
a type of matter that has little interaction with ordinary matter but is supported by
17+
various astrophysical and cosmological observations and is six times more abundant
18+
than ordinary matter in the universe.
19+
Several Large Hadron Collider (LHC) experiments are conducting
20+
searches aimed at detecting dark matter. Unsupervised and semi-supervised
21+
learning outlier detection techniques are advantageous to these searches,
22+
for casting a wide net on a variety of possibilities for how dark
23+
matter manifests, as they impose minimal constraints from specific physics
24+
model details, but rather learn to separate characteristics of rare signals
25+
starting from the knowledge of the background they’ve been trained on.
26+
Developing innovative search techniques for probing dark matter signatures
27+
is crucial for broadening the DM search program at the LHC, and BEAD
28+
is a Python package that uses deep learning based methods for anomaly detection
29+
in HEP data for such new physics searches. BEAD has been designed with modularity in
30+
mind, to enable usage of various unsupervised latent variable models for any task.
31+
32+
BEAD has five main running modes:
33+
34+
1. Data handling: Deals with handling file types, conversions between them and
35+
pre-processing the data to feed as inputs to the DL models.
36+
37+
2. Training: Trains a model to learn implicit representations of
38+
the background data that may come from multiple sources(/generators)
39+
to get a single, encriched latent representation of it.
40+
41+
3. Inference: Using a model trained on an enriched background, the user can
42+
feed in samples where to detect anomalies in.
43+
44+
4. Plotting: After running Inference, or Training, one can generate plots.
45+
These include performance plots as well as different visualizations of the learned data.
46+
47+
5. Diagnostics: Enabling this mode allows running profilers that measure
48+
a host of metrics connected to the usage of the compute node to
49+
help optimization of the code (using CPU-GPU metrics).
50+
51+
The package is under active development.
52+
The student in this project will work on the machine learning models available
53+
in BEAD, and implementing new models to perform anomaly detection, initially on simulated data.
54+
55+
## Task ideas
56+
57+
Possible projects include:
58+
59+
* New auto-encoder models could be developed,
60+
better identifying correlations between data objects in a given particle physics dataset entry
61+
(containing event level and/or physics object level information).
62+
New models could also improve performance on live / unseen data.
63+
These could include transformer, GNN, probabilistic and other tiypes of networks.
64+
* Existing models could be tested on different datasets,
65+
potentially identifying distinct latent spaces populated by the different
66+
LHC physics processes, that can enable improved anomaly detection.
67+
68+
Ideas from the student working on this project are also welcome.
69+
70+
## Expected results
71+
72+
An improved performance of selected models, with documentation and figures of merit that may include:
73+
* Plots made in matplotlib that demonstrate the performance of the new models compared to the old
74+
* Documentation of the design choices made for the improved models
75+
* Documented evaluation of a physics analysis on data before and after compression
76+
77+
## Requirements
78+
79+
* Python
80+
* Linux environment
81+
* ML / unsupervised algorithms key concepts
82+
* PyTorch
83+
84+
* Desired skills: transformers and/or graph neural networks, particle physics theory and experiments, particle physics simulations
85+
86+
87+
## Mentors
88+
* ***[Pratik Jawahar](mailto:[email protected])***
89+
* ***[Sukanya Sinha](mailto:[email protected])***
90+
* [Caterina Doglioni](mailto:[email protected]) as backup mentor
91+
92+
## Links
93+
* [Paper on unsupervised ML algorithms using HEP datasets](<https://arxiv.org/abs/2105.14027>)
94+
* [Review of LHC searches using unsupervised learning](<https://arxiv.org/abs/2312.14190>)
95+
* [BEAD GitHub repository (WIP)](<https://github.com/PRAkTIKal24/BEAD>)
96+
* [ROOT](<https://root.cern/>)
97+
* [Jupyter](<http://jupyter.org>)
98+
* [PyTorch](http://pytorch.org)

gsoc/2025/mentors.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ layout: plain
1616
* Mateusz Fila [[email protected]](mailto:[email protected]) CERN
1717
* Tobias Fitschen [[email protected]](mailto:[email protected]) UManchester
1818
* Chris Gutschow [[email protected]](mailto:[email protected]) UCLondon
19+
* Pratik Jawahar [[email protected]](mailto:[email protected]) UManchester
1920
* Aaron Jomy [[email protected]](mailto:[email protected]) CERN/CompRes
2021
* Stephan Lachnit [[email protected]](mailto:[email protected]) DESY
2122
* David Lange [[email protected]](mailto:[email protected]) CompRes
@@ -27,7 +28,8 @@ layout: plain
2728
* Felice Pantaleo [[email protected]](mailto:[email protected]) CERN
2829
* Giacomo Parolini [[email protected]](mailto:[email protected]) CERN
2930
* Alexander Penev [[email protected]](mailto:[email protected]) CompRes/University of Plovdiv, BG
30-
* Sanjiban Sengupta [[email protected]](mailto:[email protected]) CERN/UofManchester
31+
* Sukanya Sinha [[email protected]](mailto:[email protected]) UManchester
32+
* Sanjiban Sengupta [[email protected]](mailto:[email protected]) CERN/UManchester
3133
* James Smith [[email protected]](mailto:[email protected]) UManchester
3234
* Mayank Sharma [[email protected]](mailto:[email protected]) UMich
3335
* Simon Spannagel [[email protected]](mailto:[email protected]) DESY

0 commit comments

Comments
 (0)