Skip to content

Commit 2035f96

Browse files
authored
Create proposal_Baler-ProbabilisticCircuit.md (#1687)
1 parent a37e41c commit 2035f96

File tree

1 file changed

+55
-0
lines changed

1 file changed

+55
-0
lines changed
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: Probabilistic circuit for lossless HEP data compression
3+
layout: gsoc_proposal
4+
project: Baler
5+
year: 2025
6+
organization:
7+
- CERN
8+
difficulty: medium
9+
duration: 350
10+
mentor_avail: June-October (with 3 weeks mentor vacation where student will work independently with minimal guidance)
11+
---
12+
13+
## Short description of the project
14+
Neural data compression is an efficient solution for reducing the cost and computational resources of data storage in many LHC experiments.
15+
However, it suffers from the ability to precisely reconstruct compressed data, as most of the neural compression algorithms perform the decompression with the information loosage.
16+
On another hand, the lossless neural data compression schemas (VAE, IDF) have a lower compression ratio and are not fast enough for file IO.
17+
This project's task is to overcome the disadvantages of the neural compression algorithm by using the probabilistic circuit for HEP data compression.
18+
19+
## Task ideas
20+
21+
* Implement the probabilistic circuit using the PyTorch
22+
* Train and compress the HEP data (Higgs data, TopQuark Dataset)
23+
* Measure the cost and quantify the optimal compression ratio of the probabilistic circuit
24+
* Perform the benchmark, and compare the results with AE, Transformer
25+
26+
## Expected results
27+
28+
An improved compression performance with documentation and figures of merit that may include:
29+
* Implemented model of the probabilistic circuit
30+
* Documentation of the benchmark and experiment of compression of the HEP data
31+
32+
## Requirements
33+
34+
Required: Good knowledge of UNIX, Python, matplotlib, Pytorch, Julia, Pandas, ROOT.
35+
36+
## Mentors
37+
* ***[Leonid Didukh](mailto:[email protected])***
38+
* [Caterina Doglioni](mailto:[email protected]) as backup mentor
39+
40+
## Links
41+
42+
* Previous work:
43+
44+
* [GSOC 2021 project: Zenodo entry by George Dialektakis](https://zenodo.org/record/5482611#.Y-I28S2l3fa)
45+
* [Baler -- Machine Learning Based Compression of Scientific Data
46+
](https://arxiv.org/abs/2305.02283)
47+
48+
* [ROOT](https://root.cern/)
49+
* [Jupyter](http://jupyter.org)
50+
* [Lossless compression with probabilistic circuits](https://arxiv.org/pdf/2111.11632)
51+
* [iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder](https://arxiv.org/pdf/2111.00965)
52+
* [Integer Discrete Flows and Lossless Compression](https://arxiv.org/pdf/1905.07376)
53+
54+
55+

0 commit comments

Comments
 (0)