|
| 1 | +--- |
| 2 | +title: Probabilistic circuit for lossless HEP data compression |
| 3 | +layout: gsoc_proposal |
| 4 | +project: Baler |
| 5 | +year: 2025 |
| 6 | +organization: |
| 7 | + - CERN |
| 8 | +difficulty: medium |
| 9 | +duration: 350 |
| 10 | +mentor_avail: June-October (with 3 weeks mentor vacation where student will work independently with minimal guidance) |
| 11 | +--- |
| 12 | + |
| 13 | +## Short description of the project |
| 14 | +Neural data compression is an efficient solution for reducing the cost and computational resources of data storage in many LHC experiments. |
| 15 | +However, it suffers from the ability to precisely reconstruct compressed data, as most of the neural compression algorithms perform the decompression with the information loosage. |
| 16 | +On another hand, the lossless neural data compression schemas (VAE, IDF) have a lower compression ratio and are not fast enough for file IO. |
| 17 | +This project's task is to overcome the disadvantages of the neural compression algorithm by using the probabilistic circuit for HEP data compression. |
| 18 | + |
| 19 | +## Task ideas |
| 20 | + |
| 21 | +* Implement the probabilistic circuit using the PyTorch |
| 22 | +* Train and compress the HEP data (Higgs data, TopQuark Dataset) |
| 23 | +* Measure the cost and quantify the optimal compression ratio of the probabilistic circuit |
| 24 | +* Perform the benchmark, and compare the results with AE, Transformer |
| 25 | + |
| 26 | +## Expected results |
| 27 | + |
| 28 | +An improved compression performance with documentation and figures of merit that may include: |
| 29 | + * Implemented model of the probabilistic circuit |
| 30 | + * Documentation of the benchmark and experiment of compression of the HEP data |
| 31 | + |
| 32 | +## Requirements |
| 33 | + |
| 34 | +Required: Good knowledge of UNIX, Python, matplotlib, Pytorch, Julia, Pandas, ROOT. |
| 35 | + |
| 36 | +## Mentors |
| 37 | + * ***[Leonid Didukh ](mailto:[email protected])*** |
| 38 | + * [Caterina Doglioni ](mailto:[email protected]) as backup mentor |
| 39 | + |
| 40 | +## Links |
| 41 | + |
| 42 | +* Previous work: |
| 43 | + |
| 44 | + * [GSOC 2021 project: Zenodo entry by George Dialektakis](https://zenodo.org/record/5482611#.Y-I28S2l3fa) |
| 45 | + * [Baler -- Machine Learning Based Compression of Scientific Data |
| 46 | +](https://arxiv.org/abs/2305.02283) |
| 47 | + |
| 48 | + * [ROOT](https://root.cern/) |
| 49 | + * [Jupyter](http://jupyter.org) |
| 50 | + * [Lossless compression with probabilistic circuits](https://arxiv.org/pdf/2111.11632) |
| 51 | + * [iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder](https://arxiv.org/pdf/2111.00965) |
| 52 | + * [Integer Discrete Flows and Lossless Compression](https://arxiv.org/pdf/1905.07376) |
| 53 | + |
| 54 | + |
| 55 | + |
0 commit comments