Skip to content

Commit 7f46051

Browse files
egedevvolkl
andauthored
Ganga projects for GSoC 2025 (#1666)
* Ganga projects for GSoC 2025 * Update Monash link * Fix spelling error * Update _gsocorgs/2025/monashuniversity.md * placate link checker * Fix typos and implement feedback --------- Co-authored-by: Valentin Volkl <[email protected]>
1 parent 10f101e commit 7f46051

File tree

6 files changed

+126
-0
lines changed

6 files changed

+126
-0
lines changed

.github/config/mdcheck.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@
1212
{
1313
"pattern": "https://ariostas.com"
1414
},
15+
{
16+
"pattern": "https://www.monash.edu"
17+
},
1518
{
1619
"pattern": "https://indico.desy.de"
1720
}

_gsocorgs/2025/imperialcollege.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: "Imperial College London"
3+
author: "Enric Tejedor"
4+
layout: default
5+
organization: ImperialCollege
6+
logo: Imperial-College-London2.png
7+
description: |
8+
[Imperial College London](https://www.imperial.ac.uk/) is a world top ten university with an international reputation for excellence in teaching and research. Consistently rated amongst the world's best universities, Imperial is committed to developing the next generation of researchers, scientists and academics through collaboration across disciplines. Located in the heart of London, Imperial is a multidisciplinary space for education, research, translation and commercialisation, harnessing science and innovation to tackle global challenges.
9+
---
10+
11+
{% include gsoc_proposal.ext %}

_gsocorgs/2025/monashuniversity.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: "Monash University"
3+
author: "Ulrik Egede"
4+
layout: default
5+
organization: MonashUniversity
6+
logo: Monash.png
7+
description: |
8+
[Monash University](https://www.monash.edu) Monash University is one of Australia's leading universities and ranks among the world's top 100. We help change lives through research and education. It has a large faculty of Science that is active across all areas of Science from Particle Physics to the development of new methods for identifying rare earth minerals.
9+
---
10+
11+
{% include gsoc_proposal.ext %}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
project: Ganga
3+
layout: default
4+
logo: ganga_logo_150dpi.png
5+
description: |
6+
[Ganga](https://github.com/ganga-devs/ganga) is a computational task-management tool, which allows for the specification, submission, bookkeeping and post-processing of computational tasks on a wide set of distributed resources.
7+
Ganga has been developed to solve a problem increasingly common in scientific projects, which is that researchers must regularly switch between different processing systems, each with its own command set, to complete their computational tasks. Ganga provides a homogeneous environment for processing data on heterogeneous resources.
8+
---
9+
10+
{% include gsoc_project.ext %}
11+
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
project: Ganga
3+
title: Incorporate a Large Language Model to assist users
4+
layout: gsoc_proposal
5+
year: 2025
6+
difficulty: medium
7+
duration: 350
8+
mentor_avail: May-November
9+
organization:
10+
- ImperialCollege
11+
- MonashUniversity
12+
---
13+
14+
## Description
15+
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
16+
17+
As a scripting and command line interface, there will naturally be users that have problems with getting the syntax correct. To solve this, they will often spend time searching through mailing lists, FAQs and discussion fora or indeed just wait for another more advanced coder to debug their problem. The idea of this project is to integrate a Large Language Model (LLM) into the command prompt in Ganga. This should allow the user to describe in words what they would like to do and get an example that they can incorporate. It should also intercept exceptions thrown by the Ganga interface, help the user to understand them and propose solutions.
18+
19+
We have an interface based on ollama that will build a RAG that contains extra information about Ganga that has not been available for the training of the underlying LLM.
20+
21+
## Task ideas
22+
* Integrate the interaction with the LLM and RAG into Ganga.
23+
* Integrate past input and output in the CLI to provide context for the CLI.
24+
* Setup a server such that the LLM can run on a remote server requiring minimal installation by the user.
25+
* Test which samples are most useful for adding to the RAG (mailing list discussions, manuals, instant messages)
26+
* Develop continuous integration tests that ensures that LLM integration will keep working.
27+
28+
## Expected results
29+
For the scientific users of Ganga, this will speed up their development cycle as they will get a faster response to the usage queries that they have.
30+
31+
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with how LLMs can be integrated directly into projects to assist users in the use of the CLI and in understanding error messages.
32+
33+
## Evaluation Task
34+
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
35+
36+
## Requirements
37+
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)
38+
39+
## Mentors
40+
* [Alex Richards](mailto:[email protected])
41+
* [Mark Smith](mailto:[email protected])
42+
* **[Ulrik Egede](mailto:[email protected])**
43+
44+
## Links
45+
* [Ganga](https://github.com/ganga-devs/ganga)
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
project: Ganga
3+
title: Implement a deprecation system to keep code up to date
4+
layout: gsoc_proposal
5+
year: 2025
6+
difficulty: medium
7+
duration: 350
8+
mentor_avail: May-November
9+
organization:
10+
- ImperialCollege
11+
- MonashUniversity
12+
---
13+
14+
## Description
15+
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
16+
17+
As code that has developed over many years, there are part of the API that has become redundant. This means that for a period of time there will be both the old and now deprecated API as well as the new way of doing things. At the moment Ganga is missing a formal way of deprecating code. This means that warnings about using something deprecated are non-uniform and there is also very old code that has never been cleaned up.
18+
19+
The idea in this project is to formalise the way that code can be declared deprecated and then use the continuous integration to ensure that the code eventually is deleted.
20+
21+
## Task ideas
22+
* Have a well defined way of marking plugins, functions etc as deprecated with a warning about when they will be removed. Building on top
23+
of the python package [deprecated](https://pypi.org/project/Deprecated/) might be an idea.
24+
* Run tests in the testing framework that will alert developers to that certain parts of the code can now be removed.
25+
* Apply in the testing framework a similar system that will identify when deprecated python features are used when moving to a new python version.
26+
* Apply the deprecation system to parts of the code that is already deprecated.
27+
28+
## Expected results
29+
Obtain a cleaner code base where very old and since long deprecated code is no longer present. Provide the end user with consistent warnings about their use of deprecated code as well as when it will be removed.
30+
31+
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with working within a large code base that has gone through many developments.
32+
33+
## Evaluation Task
34+
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
35+
36+
## Requirements
37+
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)
38+
39+
## Mentors
40+
* [Alex Richards](mailto:[email protected])
41+
* [Mark Smith](mailto:[email protected])
42+
* **[Ulrik Egede](mailto:[email protected])**
43+
44+
## Links
45+
* [Ganga](https://github.com/ganga-devs/ganga)

0 commit comments

Comments
 (0)