Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions _gsocprojects/2025/project_Patatrack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
project: Patatrack
layout: default
logo: patatrack.png
description: |
[Patatrack](https://patatrack.web.cern.ch/patatrack/index.html) project started in 2016 by a group of people with various area of expertise, such as software optimization, heterogeneous computing, track reconstruction and High Level Trigger (HLT) at the CMS experiment at CERN. The goal was to demonstrate that part of the HLT reconstruction could be efficienty offloaded on machines equipped with GPUs for parallel execution. Nowadays, Patatrack developments have been integrated into the CMS software for event reconstruction and the project focuses on the exploration of innovative software and hardware technologies to bring smart software closer to the detectors read-out at CERN experiments.
---
{% include gsoc_project.ext %}
63 changes: 63 additions & 0 deletions _gsocproposals/2025/proposal_CLUEsteringAutotuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Development of an auto-tuning tool for the CLUEstering library
layout: gsoc_proposal
project: Patatrack
year: 2025
organization: CERN
---

## Description
[CLUE][clue] is a fast and fully parallelizable density-based clustering algorithm, optimized for high-
occupancy scenarios, where the number of clusters is much larger than the average number of hits
in a cluster ([Rovere et al. 2020][cluepaper]). The algorithm uses a grid spatial index for fast querying of
neighbors and its timing scales linearly with the number of points within the range considered. It is
currently used in the CMS and CLIC event reconstruction software for clustering calorimetric hits in
two dimensions based on their energy. The CLUE algorithm has been generalized to an arbitrary
number of dimensions and to a wider range of applications in [CLUEstering][cluestering], a general purpose
clustering library, with the backend implemented in C++ and providing a Python interface for
easier use. The backend can be executed on multiple backends (serial, TBB, GPUs, ecc) thanks
to the [Alpaka][alpakapaper] performance portability library. One feature currently lacking from CLUEstering
and that would be extremely useful for every user, is an autotuning of the parameters, that given
the expected number of clusters computes the combination of input parameters that results in the best
clustering.
For this task, one of the options to be explored is “The Optimizer”, a Python library developed by
the Patatrack group of the CMS experiment which provides a collection of optimization algorithm,
in particular MOPSO (Multi-Objective Particle Swarm Optimization).

## Expected results
* Consider the best techniques and tools for the task
* Develop an auto-tuning tool for the parameters of CLUEstering
* Test it on a wide range of commonly used datasets
* Benchmark and profile to identify the bottlenecks of the tool and optimize it

## Evaluation Task
Interested students please contact [email protected]

## Technologies
* C++, Python

## Desirable skills
* Experience with development in C++17/20
* Experience with GPU computing
* Experience with machine learning and optimization techniques
* Experience with development of Python libraries

## Additional information
Difficulty level (low, medium, hard): medium
Duration: 350 hours
Mentor availability: June-October

## Mentors
* **[Simone Balducci](mailto:[email protected]) (CERN UNIBO)**
* [Felice Pantaleo](mailto:[email protected]) (CERN)

## Links
* [CLUE][clue]
* [CLUEstering][cluestering]
* [Alpaka][alpaka]

[clue]: https://gitlab.cern.ch/kalos/clue
[cluestering]: https://github.com/cms-patatrack/CLUEstering
[cluepaper]: https://www.frontiersin.org/articles/10.3389/fdata.2020.591315/full
[alpakapaper]: https://arxiv.org/abs/1602.08477
[alpaka]: https://github.com/alpaka-group/alpaka
2 changes: 2 additions & 0 deletions gsoc/2025/mentors.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ layout: plain
**Note for contributors:** entries must be sorted in **last name** alphabetic order

## Full Mentor List (Name, Email, Org)
* Simone Balducci [[email protected]](mailto:[email protected]) CERN
* Martin Barisits [[email protected]](mailto:[email protected]) CERN
* Lukas Breitwieser [[email protected]](mailto:[email protected]) CERN
* Andy Buckley [[email protected]](mailto:[email protected]) UofGlasgow
Expand All @@ -18,6 +19,7 @@ layout: plain
* David Lange [[email protected]](mailto:[email protected]) CompRes
* Serguei Linev [[email protected]](mailto:[email protected]) GSI
* Peter McKeown [[email protected]](mailto:[email protected]) CERN
* Felice Pantaleo [[email protected]](mailto:[email protected]) CERN
* Giacomo Parolini [[email protected]](mailto:[email protected]) CERN
* Alexander Penev [[email protected]](mailto:[email protected]) CompRes/University of Plovdiv, BG
* Mayank Sharma [[email protected]](mailto:[email protected]) UMich
Expand Down