Merge branch 'main' into gsoc25-cvmfs

vvolkl · web-flow · commit cd84675c1a6c · 2025-02-11T12:01:01.000+01:00
diff --git a/.github/config/mdcheck.json b/.github/config/mdcheck.json
@@ -12,6 +12,9 @@
     {
       "pattern": "https://ariostas.com"
     },
+    {
+      "pattern": "https://www.monash.edu"
+    },
     {
       "pattern": "https://indico.desy.de"
     }
diff --git a/_gsocorgs/2025/imperialcollege.md b/_gsocorgs/2025/imperialcollege.md
@@ -0,0 +1,11 @@
+---
+title: "Imperial College London"
+author: "Enric Tejedor"
+layout: default
+organization: ImperialCollege
+logo: Imperial-College-London2.png
+description: |
+  [Imperial College London](https://www.imperial.ac.uk/) is a world top ten university with an international reputation for excellence in teaching and research. Consistently rated amongst the world's best universities, Imperial is committed to developing  the next generation of researchers, scientists and academics through collaboration across disciplines. Located in the heart of London, Imperial is a multidisciplinary space for education, research, translation and commercialisation, harnessing science and innovation to tackle global challenges.
+---
+
+{% include gsoc_proposal.ext %}
diff --git a/_gsocorgs/2025/monashuniversity.md b/_gsocorgs/2025/monashuniversity.md
@@ -0,0 +1,11 @@
+---
+title: "Monash University"
+author: "Ulrik Egede"
+layout: default
+organization: MonashUniversity
+logo: Monash.png
+description: |
+  [Monash University](https://www.monash.edu) Monash University is one of Australia's leading universities and ranks among the world's top 100. We help change lives through research and education. It has a large faculty of Science that is active across all areas of Science from Particle Physics to the development of new methods for identifying rare earth minerals.
+---
+
+{% include gsoc_proposal.ext %}
diff --git a/_gsocprojects/2025/project_Ganga.md b/_gsocprojects/2025/project_Ganga.md
@@ -0,0 +1,11 @@
+---
+project: Ganga
+layout: default
+logo: ganga_logo_150dpi.png
+description: |
+   [Ganga](https://github.com/ganga-devs/ganga) is a computational task-management tool, which allows for the specification, submission, bookkeeping and post-processing of computational tasks on a wide set of distributed resources.
+   Ganga has been developed to solve a problem increasingly common in scientific projects, which is that researchers must regularly switch between different processing systems, each with its own command set, to complete their computational tasks. Ganga provides a homogeneous environment for processing data on heterogeneous resources.
+---
+
+{% include gsoc_project.ext %}
+
diff --git a/_gsocprojects/2025/project_Geant4.md b/_gsocprojects/2025/project_Geant4.md
@@ -0,0 +1,17 @@
+---
+project: Geant4
+layout: default
+logo: Geant4-logo.png
+description: |
+  [Geant4](https://geant4.web.cern.ch/) is a toolkit for the simulation of the
+  passage of particles through matter. Its areas of application include high
+  energy, nuclear and accelerator physics, as well as studies in medical and space
+  science. The three main reference papers for Geant4 are published in Nuclear
+  Instruments and Methods in Physics Research A 506 (2003) 250-303,  IEEE
+  Transactions on Nuclear Science 53 No. 1 (2006) 270-278 and Nuclear Instruments
+  and Methods in Physics Research A 835 (2016) 186-225.
+summary: |
+  [Geant4](https://geant4.web.cern.ch/) is a toolkit for the simulation of the passage of particles through matter.
+---
+
+{% include gsoc_project.ext %}
diff --git a/_gsocprojects/2025/project_Patatrack.md b/_gsocprojects/2025/project_Patatrack.md
@@ -0,0 +1,8 @@
+---
+project: Patatrack 
+layout: default
+logo: patatrack-logo.png
+description: |
+    [Patatrack](https://patatrack.web.cern.ch/patatrack/index.html) project started in 2016 by a group of people with various area of expertise, such as software optimization, heterogeneous computing, track reconstruction and High Level Trigger (HLT) at the CMS experiment at CERN. The goal was to demonstrate that part of the HLT reconstruction could be efficiently offloaded on machines equipped with GPUs for parallel execution. Nowadays, Patatrack developments have been integrated into the CMS software for event reconstruction and the project focuses on the exploration of innovative software and hardware technologies to bring smart software closer to the detectors read-out at CERN experiments.
+---
+{% include gsoc_project.ext %}
diff --git a/_gsocproposals/2025/proposal_CLUEsteringAutotuning.md b/_gsocproposals/2025/proposal_CLUEsteringAutotuning.md
@@ -0,0 +1,63 @@
+---
+title: Development of an auto-tuning tool for the CLUEstering library
+layout: gsoc_proposal
+project: Patatrack 
+year: 2025
+organization: CERN
+---
+
+## Description
+[CLUE][clue] is a fast and fully parallelizable density-based clustering algorithm, optimized for high-
+occupancy scenarios, where the number of clusters is much larger than the average number of hits
+in a cluster ([Rovere et al. 2020][cluepaper]). The algorithm uses a grid spatial index for fast querying of
+neighbors and its timing scales linearly with the number of points within the range considered. It is
+currently used in the CMS and CLIC event reconstruction software for clustering calorimetric hits in
+two dimensions based on their energy. The CLUE algorithm has been generalized to an arbitrary
+number of dimensions and to a wider range of applications in [CLUEstering][cluestering], a general purpose
+clustering library, with the backend implemented in C++ and providing a Python interface for
+easier use. The backend can be executed on multiple backends (serial, TBB, GPUs, ecc) thanks
+to the [Alpaka][alpakapaper] performance portability library. One feature currently lacking from CLUEstering
+and that would be extremely useful for every user, is an autotuning of the parameters, that given
+the expected number of clusters computes the combination of input parameters that results in the best
+clustering.  
+For this task, one of the options to be explored is “The Optimizer”, a Python library developed by
+the Patatrack group of the CMS experiment which provides a collection of optimization algorithm,
+in particular MOPSO (Multi-Objective Particle Swarm Optimization).
+
+## Expected results
+* Consider the best techniques and tools for the task
+* Develop an auto-tuning tool for the parameters of CLUEstering
+* Test it on a wide range of commonly used datasets
+* Benchmark and profile to identify the bottlenecks of the tool and optimize it
+
+## Evaluation Task
+Interested students please contact simone.balducci@cern.ch
+
+## Technologies
+* C++, Python
+
+## Desirable skills
+* Experience with development in C++17/20
+* Experience with GPU computing
+* Experience with machine learning and optimization techniques
+* Experience with development of Python libraries
+
+## Additional information
+* Difficulty level (low, medium, hard): medium
+* Duration: 350 hours
+* Mentor availability: June-October
+
+## Mentors
+  * **[Simone Balducci](mailto:simone.balducci@cern.ch) (CERN UNIBO)**
+  * [Felice Pantaleo](mailto:felice.pantaleo@cern.ch) (CERN)
+
+## Links
+  * [CLUE][clue]
+  * [CLUEstering][cluestering]
+  * [Alpaka][alpaka]
+
+[clue]: https://gitlab.cern.ch/kalos/clue
+[cluestering]: https://github.com/cms-patatrack/CLUEstering
+[cluepaper]: https://www.frontiersin.org/articles/10.3389/fdata.2020.591315/full
+[alpakapaper]: https://arxiv.org/abs/1602.08477
+[alpaka]: https://github.com/alpaka-group/alpaka
diff --git a/_gsocproposals/2025/proposal_Clad-STLConcurrency.md b/_gsocproposals/2025/proposal_Clad-STLConcurrency.md
@@ -31,7 +31,7 @@ An example demonstrating the use of differentiation of codes utilizing paralleli
 #include <numeric>
 #include <thread>
 #include <vector>
-#include "clad/Differentiator/Differentiator.h"q
+#include "clad/Differentiator/Differentiator.h"
 
 using VectorD = std::vector<double>;
 using MatrixD = std::vector<VectorD>;
diff --git a/_gsocproposals/2025/proposal_GangaAIassistant.md b/_gsocproposals/2025/proposal_GangaAIassistant.md
@@ -0,0 +1,45 @@
+---
+project: Ganga
+title: Incorporate a Large Language Model to assist users
+layout: gsoc_proposal
+year: 2025
+difficulty: medium
+duration: 350
+mentor_avail: May-November
+organization:
+  - ImperialCollege
+  - MonashUniversity
+---
+
+## Description
+The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
+
+As a scripting and command line interface, there will naturally be users that have problems with getting the syntax correct. To solve this, they will often spend time searching through mailing lists, FAQs and discussion fora or indeed just wait for another more advanced coder to debug their problem. The idea of this project is to integrate a Large Language Model (LLM) into the command prompt in Ganga. This should allow the user to describe in words what they would like to do and get an example that they can incorporate. It should also intercept exceptions thrown by the Ganga interface, help the user to understand them and propose solutions.
+
+We have an interface based on ollama that will build a RAG that contains extra information about Ganga that has not been available for the training of the underlying LLM.
+
+## Task ideas
+ * Integrate the interaction with the LLM and RAG into Ganga.
+ * Integrate past input and output in the CLI to provide context for the CLI.
+ * Setup a server such that the LLM can run on a remote server requiring minimal installation by the user.
+ * Test which samples are most useful for adding to the RAG (mailing list discussions, manuals, instant messages)
+ * Develop continuous integration tests that ensures that LLM integration will keep working.
+
+## Expected results
+For the scientific users of Ganga, this will speed up their development cycle as they will get a faster response to the usage queries that they have.
+
+As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with how LLMs can be integrated directly into projects to assist users in the use of the CLI and in understanding error messages.
+
+## Evaluation Task
+Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
+
+## Requirements
+Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)
+
+## Mentors 
+  * [Alex Richards](mailto:a.richards@imperial.ac.uk)
+  * [Mark Smith](mailto:mark.smith1@imperial.ac.uk)
+  * **[Ulrik Egede](mailto:ulrik.egede@monash.edu)**
+
+## Links
+  * [Ganga](https://github.com/ganga-devs/ganga)
diff --git a/_gsocproposals/2025/proposal_GangaDeprecation.md b/_gsocproposals/2025/proposal_GangaDeprecation.md
@@ -0,0 +1,45 @@
+---
+project: Ganga
+title: Implement a deprecation system to keep code up to date
+layout: gsoc_proposal
+year: 2025
+difficulty: medium
+duration: 350
+mentor_avail: May-November
+organization:
+  - ImperialCollege
+  - MonashUniversity
+---
+
+## Description
+The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
+
+As code that has developed over many years, there are part of the API that has become redundant. This means that for a period of time there will be both the old and now deprecated API as well as the new way of doing things. At the moment Ganga is missing a formal way of deprecating code. This means that warnings about using something deprecated are non-uniform and there is also very old code that has never been cleaned up.
+
+The idea in this project is to formalise the way that code can be declared deprecated and then use the continuous integration to ensure that the code eventually is deleted. 
+
+## Task ideas
+ * Have a well defined way of marking plugins, functions etc as deprecated with a warning about when they will be removed. Building on top
+ of the python package [deprecated](https://pypi.org/project/Deprecated/) might be an idea.
+ * Run tests in the testing framework that will alert developers to that certain parts of the code can now be removed.
+ * Apply in the testing framework a similar system that will identify when deprecated python features are used when moving to a new python version.
+ * Apply the deprecation system to parts of the code that is already deprecated.
+
+## Expected results
+Obtain a cleaner code base where very old and since long deprecated code is no longer present. Provide the end user with consistent warnings about their use of deprecated code as well as when it will be removed.
+
+As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with working within a large code base that has gone through many developments.
+
+## Evaluation Task
+Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
+
+## Requirements
+Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)
+
+## Mentors 
+  * [Alex Richards](mailto:a.richards@imperial.ac.uk)
+  * [Mark Smith](mailto:mark.smith1@imperial.ac.uk)
+  * **[Ulrik Egede](mailto:ulrik.egede@monash.edu)**
+
+## Links
+  * [Ganga](https://github.com/ganga-devs/ganga)
diff --git a/_gsocproposals/2025/proposal_Geant4-fastsim_representation.md b/_gsocproposals/2025/proposal_Geant4-fastsim_representation.md
@@ -0,0 +1,56 @@
+---
+title: Geant4-FastSim - Data Representation Optimisation for Generative Model-based Fast Calorimeter Shower Simulation
+layout: gsoc_proposal
+project: Geant4
+year: 2025
+difficulty: medium
+duration: 350
+mentor_avail: June-October
+organization:
+  - CERN
+---
+
+## Description
+
+High energy physics experiments such as those operated at the Large Hadron Collider (LHC) fundamentally rely on detailed and realistic simulations of particle interactions with the detector. The state-of-the-art Geant4 toolkit provides a means of conducting these simulations with Monte Carlo procedures.  However, the simulation of particle showers in the calorimeter systems of collider detectors with such tools is a computationally intensive task. For this reason, alternative fast simulation approaches based on generative models have received significant attention, with these models now being deployed in production by current experiments at the LHC. In order to develop the next generation of fast simulation tools, approaches are being explored that would be able to handle larger data dimensionalities stemming from the higher granularity present in future detectors, while also being efficient enough to provide a sizable simulation speed-up for low energy showers. 
+
+A shower representation which has the potential to meet these criteria is a point cloud, which can be constructed from the position, energy and time of hits in the calorimeter. Since Geant4 provides access to the (very numerous) individual physical interactions simulated in the calorimeter, it also provides a means to create a representation independent of the physical readout geometry of the detector. This project will explore different approaches to clustering these individual simulated hits into a point cloud, seeking to minimise the number of points while preserving key calorimetric observables.
+
+## First Steps
+
+1. Gain a basic understanding of calorimeter shower simulation ([G4FastSim](https://g4fastsim.web.cern.ch/))
+2. Try simulating some electromagnetic particle showers with the [Key4hep](https://key4hep.github.io/key4hep-doc/) framework (see test)
+3. Propose different approaches to clustering, with justification
+
+## Project Milestones
+
+* Survey different approaches to clustering
+* Implement and experiment with the different methods
+* Investigate the impact of varying the detector granularity on the performance of separate clustering algorithms
+* If time allows, hadronic showers could also be investigated
+
+## Expected Results
+
+* A comparison of different approaches to clustering, with a performance evaluation in terms of the effect on calorimetric observables.
+* An evaluation of the impact of varying the granularity of the detector readout on the performance of the clustering algorithm
+
+## Requirements
+
+* C++, Python
+* Familiarity with PyTorch could be an advantage
+
+## Evaluation Tasks and Timeline
+
+1. Find the test [here](https://docs.google.com/document/d/1XYF8xFfprqiYYnjxu7Bzm8Ps-s646VJhIkDCQJd8n_8/edit?usp=sharing). Please submit it by 9:00 CET 17th March 2025 along with a short proposal (2 pages max) describing how you would approach the problem. See submission instructions in the test doc. Please don't forget to start the subject line with "GSoC'25 FastSim".
+2. We will make the selections based on the test, short proposal and resume by 17:00 CET 24th March.
+3. Selected candidates will then write the full proposal and submit it according to the official GSoC timeline.
+
+## Mentors
+(As we typically receive a large number of responses and we are not able to reply to all initial messages, please only contact us after completing the test)
+* [Peter McKeown](mailto:peter.mckeown@cern.ch) (CERN)
+* Piyush Raikwar (CERN)
+* Anna Zaborowska (CERN)
+
+## Links
+* [G4FastSim](https://g4fastsim.web.cern.ch/)
+* [CaloChallenge 2022: A Community Challenge for Fast Calorimeter Simulation](https://arxiv.org/abs/2410.21611)
diff --git a/_gsocproposals/2025/proposal_HGQforCICADA.md b/_gsocproposals/2025/proposal_HGQforCICADA.md
@@ -3,6 +3,9 @@ title: Highly Granular Quantization for CICADA
 layout: gsoc_proposal
 project: CICADA
 year: 2025
+difficulty: medium
+duration: 350
+mentor_avail: Flexible
 organization: princeton
 ---
 
@@ -27,6 +30,7 @@ Python, Tensorflow, Quantization
 ## Mentors
   * [Lino Gerlach](mailto:lino.oscar.gerlach@cern.ch)
   * [Isobel Ojalvo](mailto:iojalvo@princeton.edu)
+  * [Jennifer Ngadiuba](mailto:jennifer.ngadiuba@cern.ch)
   
 ## Links
   * [CICADA (homepage)](https://cicada.web.cern.ch/)
diff --git a/_gsocproposals/2025/proposal_XeusCpp-Debugging.md b/_gsocproposals/2025/proposal_XeusCpp-Debugging.md
diff --git a/_gsocproposals/2025/proposal_XeusCpp-Plugins.md b/_gsocproposals/2025/proposal_XeusCpp-Plugins.md
diff --git a/announcements/_posts/2024/2024-09-30-JuliaHEP2024.md b/announcements/_posts/2024/2024-09-30-JuliaHEP2024.md
diff --git a/announcements/_posts/2025/2025-02-10-JuliaHEP.md b/announcements/_posts/2025/2025-02-10-JuliaHEP.md
diff --git a/announcements/_posts/2025/2025-02-10-WLCG-HSF.md b/announcements/_posts/2025/2025-02-10-WLCG-HSF.md
diff --git a/gsoc/2025/mentors.md b/gsoc/2025/mentors.md

Original file line number	Diff line number	Diff line change
`@@ -12,6 +12,9 @@`
`12`	`12`	`{`
`13`	`13`	`"pattern": "https://ariostas.com"`
`14`	`14`	`},`
	`15`	`+ {`
	`16`	`+ "pattern": "https://www.monash.edu"`
	`17`	`+ },`
`15`	`18`	`{`
`16`	`19`	`"pattern": "https://indico.desy.de"`
`17`	`20`	`}`