Skip to content

Commit cd84675

Browse files
authored
Merge branch 'main' into gsoc25-cvmfs
2 parents 59151bd + 55cea4d commit cd84675

18 files changed

+408
-2
lines changed

.github/config/mdcheck.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@
1212
{
1313
"pattern": "https://ariostas.com"
1414
},
15+
{
16+
"pattern": "https://www.monash.edu"
17+
},
1518
{
1619
"pattern": "https://indico.desy.de"
1720
}

_gsocorgs/2025/imperialcollege.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: "Imperial College London"
3+
author: "Enric Tejedor"
4+
layout: default
5+
organization: ImperialCollege
6+
logo: Imperial-College-London2.png
7+
description: |
8+
[Imperial College London](https://www.imperial.ac.uk/) is a world top ten university with an international reputation for excellence in teaching and research. Consistently rated amongst the world's best universities, Imperial is committed to developing the next generation of researchers, scientists and academics through collaboration across disciplines. Located in the heart of London, Imperial is a multidisciplinary space for education, research, translation and commercialisation, harnessing science and innovation to tackle global challenges.
9+
---
10+
11+
{% include gsoc_proposal.ext %}

_gsocorgs/2025/monashuniversity.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: "Monash University"
3+
author: "Ulrik Egede"
4+
layout: default
5+
organization: MonashUniversity
6+
logo: Monash.png
7+
description: |
8+
[Monash University](https://www.monash.edu) Monash University is one of Australia's leading universities and ranks among the world's top 100. We help change lives through research and education. It has a large faculty of Science that is active across all areas of Science from Particle Physics to the development of new methods for identifying rare earth minerals.
9+
---
10+
11+
{% include gsoc_proposal.ext %}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
project: Ganga
3+
layout: default
4+
logo: ganga_logo_150dpi.png
5+
description: |
6+
[Ganga](https://github.com/ganga-devs/ganga) is a computational task-management tool, which allows for the specification, submission, bookkeeping and post-processing of computational tasks on a wide set of distributed resources.
7+
Ganga has been developed to solve a problem increasingly common in scientific projects, which is that researchers must regularly switch between different processing systems, each with its own command set, to complete their computational tasks. Ganga provides a homogeneous environment for processing data on heterogeneous resources.
8+
---
9+
10+
{% include gsoc_project.ext %}
11+
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
project: Geant4
3+
layout: default
4+
logo: Geant4-logo.png
5+
description: |
6+
[Geant4](https://geant4.web.cern.ch/) is a toolkit for the simulation of the
7+
passage of particles through matter. Its areas of application include high
8+
energy, nuclear and accelerator physics, as well as studies in medical and space
9+
science. The three main reference papers for Geant4 are published in Nuclear
10+
Instruments and Methods in Physics Research A 506 (2003) 250-303, IEEE
11+
Transactions on Nuclear Science 53 No. 1 (2006) 270-278 and Nuclear Instruments
12+
and Methods in Physics Research A 835 (2016) 186-225.
13+
summary: |
14+
[Geant4](https://geant4.web.cern.ch/) is a toolkit for the simulation of the passage of particles through matter.
15+
---
16+
17+
{% include gsoc_project.ext %}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
project: Patatrack
3+
layout: default
4+
logo: patatrack-logo.png
5+
description: |
6+
[Patatrack](https://patatrack.web.cern.ch/patatrack/index.html) project started in 2016 by a group of people with various area of expertise, such as software optimization, heterogeneous computing, track reconstruction and High Level Trigger (HLT) at the CMS experiment at CERN. The goal was to demonstrate that part of the HLT reconstruction could be efficiently offloaded on machines equipped with GPUs for parallel execution. Nowadays, Patatrack developments have been integrated into the CMS software for event reconstruction and the project focuses on the exploration of innovative software and hardware technologies to bring smart software closer to the detectors read-out at CERN experiments.
7+
---
8+
{% include gsoc_project.ext %}
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: Development of an auto-tuning tool for the CLUEstering library
3+
layout: gsoc_proposal
4+
project: Patatrack
5+
year: 2025
6+
organization: CERN
7+
---
8+
9+
## Description
10+
[CLUE][clue] is a fast and fully parallelizable density-based clustering algorithm, optimized for high-
11+
occupancy scenarios, where the number of clusters is much larger than the average number of hits
12+
in a cluster ([Rovere et al. 2020][cluepaper]). The algorithm uses a grid spatial index for fast querying of
13+
neighbors and its timing scales linearly with the number of points within the range considered. It is
14+
currently used in the CMS and CLIC event reconstruction software for clustering calorimetric hits in
15+
two dimensions based on their energy. The CLUE algorithm has been generalized to an arbitrary
16+
number of dimensions and to a wider range of applications in [CLUEstering][cluestering], a general purpose
17+
clustering library, with the backend implemented in C++ and providing a Python interface for
18+
easier use. The backend can be executed on multiple backends (serial, TBB, GPUs, ecc) thanks
19+
to the [Alpaka][alpakapaper] performance portability library. One feature currently lacking from CLUEstering
20+
and that would be extremely useful for every user, is an autotuning of the parameters, that given
21+
the expected number of clusters computes the combination of input parameters that results in the best
22+
clustering.
23+
For this task, one of the options to be explored is “The Optimizer”, a Python library developed by
24+
the Patatrack group of the CMS experiment which provides a collection of optimization algorithm,
25+
in particular MOPSO (Multi-Objective Particle Swarm Optimization).
26+
27+
## Expected results
28+
* Consider the best techniques and tools for the task
29+
* Develop an auto-tuning tool for the parameters of CLUEstering
30+
* Test it on a wide range of commonly used datasets
31+
* Benchmark and profile to identify the bottlenecks of the tool and optimize it
32+
33+
## Evaluation Task
34+
Interested students please contact [email protected]
35+
36+
## Technologies
37+
* C++, Python
38+
39+
## Desirable skills
40+
* Experience with development in C++17/20
41+
* Experience with GPU computing
42+
* Experience with machine learning and optimization techniques
43+
* Experience with development of Python libraries
44+
45+
## Additional information
46+
* Difficulty level (low, medium, hard): medium
47+
* Duration: 350 hours
48+
* Mentor availability: June-October
49+
50+
## Mentors
51+
* **[Simone Balducci](mailto:[email protected]) (CERN UNIBO)**
52+
* [Felice Pantaleo](mailto:[email protected]) (CERN)
53+
54+
## Links
55+
* [CLUE][clue]
56+
* [CLUEstering][cluestering]
57+
* [Alpaka][alpaka]
58+
59+
[clue]: https://gitlab.cern.ch/kalos/clue
60+
[cluestering]: https://github.com/cms-patatrack/CLUEstering
61+
[cluepaper]: https://www.frontiersin.org/articles/10.3389/fdata.2020.591315/full
62+
[alpakapaper]: https://arxiv.org/abs/1602.08477
63+
[alpaka]: https://github.com/alpaka-group/alpaka

_gsocproposals/2025/proposal_Clad-STLConcurrency.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ An example demonstrating the use of differentiation of codes utilizing paralleli
3131
#include <numeric>
3232
#include <thread>
3333
#include <vector>
34-
#include "clad/Differentiator/Differentiator.h"q
34+
#include "clad/Differentiator/Differentiator.h"
3535
3636
using VectorD = std::vector<double>;
3737
using MatrixD = std::vector<VectorD>;
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
project: Ganga
3+
title: Incorporate a Large Language Model to assist users
4+
layout: gsoc_proposal
5+
year: 2025
6+
difficulty: medium
7+
duration: 350
8+
mentor_avail: May-November
9+
organization:
10+
- ImperialCollege
11+
- MonashUniversity
12+
---
13+
14+
## Description
15+
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
16+
17+
As a scripting and command line interface, there will naturally be users that have problems with getting the syntax correct. To solve this, they will often spend time searching through mailing lists, FAQs and discussion fora or indeed just wait for another more advanced coder to debug their problem. The idea of this project is to integrate a Large Language Model (LLM) into the command prompt in Ganga. This should allow the user to describe in words what they would like to do and get an example that they can incorporate. It should also intercept exceptions thrown by the Ganga interface, help the user to understand them and propose solutions.
18+
19+
We have an interface based on ollama that will build a RAG that contains extra information about Ganga that has not been available for the training of the underlying LLM.
20+
21+
## Task ideas
22+
* Integrate the interaction with the LLM and RAG into Ganga.
23+
* Integrate past input and output in the CLI to provide context for the CLI.
24+
* Setup a server such that the LLM can run on a remote server requiring minimal installation by the user.
25+
* Test which samples are most useful for adding to the RAG (mailing list discussions, manuals, instant messages)
26+
* Develop continuous integration tests that ensures that LLM integration will keep working.
27+
28+
## Expected results
29+
For the scientific users of Ganga, this will speed up their development cycle as they will get a faster response to the usage queries that they have.
30+
31+
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with how LLMs can be integrated directly into projects to assist users in the use of the CLI and in understanding error messages.
32+
33+
## Evaluation Task
34+
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
35+
36+
## Requirements
37+
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)
38+
39+
## Mentors
40+
* [Alex Richards](mailto:[email protected])
41+
* [Mark Smith](mailto:[email protected])
42+
* **[Ulrik Egede](mailto:[email protected])**
43+
44+
## Links
45+
* [Ganga](https://github.com/ganga-devs/ganga)
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
project: Ganga
3+
title: Implement a deprecation system to keep code up to date
4+
layout: gsoc_proposal
5+
year: 2025
6+
difficulty: medium
7+
duration: 350
8+
mentor_avail: May-November
9+
organization:
10+
- ImperialCollege
11+
- MonashUniversity
12+
---
13+
14+
## Description
15+
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
16+
17+
As code that has developed over many years, there are part of the API that has become redundant. This means that for a period of time there will be both the old and now deprecated API as well as the new way of doing things. At the moment Ganga is missing a formal way of deprecating code. This means that warnings about using something deprecated are non-uniform and there is also very old code that has never been cleaned up.
18+
19+
The idea in this project is to formalise the way that code can be declared deprecated and then use the continuous integration to ensure that the code eventually is deleted.
20+
21+
## Task ideas
22+
* Have a well defined way of marking plugins, functions etc as deprecated with a warning about when they will be removed. Building on top
23+
of the python package [deprecated](https://pypi.org/project/Deprecated/) might be an idea.
24+
* Run tests in the testing framework that will alert developers to that certain parts of the code can now be removed.
25+
* Apply in the testing framework a similar system that will identify when deprecated python features are used when moving to a new python version.
26+
* Apply the deprecation system to parts of the code that is already deprecated.
27+
28+
## Expected results
29+
Obtain a cleaner code base where very old and since long deprecated code is no longer present. Provide the end user with consistent warnings about their use of deprecated code as well as when it will be removed.
30+
31+
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with working within a large code base that has gone through many developments.
32+
33+
## Evaluation Task
34+
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
35+
36+
## Requirements
37+
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)
38+
39+
## Mentors
40+
* [Alex Richards](mailto:[email protected])
41+
* [Mark Smith](mailto:[email protected])
42+
* **[Ulrik Egede](mailto:[email protected])**
43+
44+
## Links
45+
* [Ganga](https://github.com/ganga-devs/ganga)

0 commit comments

Comments
 (0)