-
Notifications
You must be signed in to change notification settings - Fork 351
Ganga projects for GSoC 2025 #1666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
66b9f78
Ganga projects for GSoC 2025
egede 0512dcb
Update Monash link
egede 739d78b
Fix spelling error
egede b6c27b3
Update _gsocorgs/2025/monashuniversity.md
vvolkl a71c85d
placate link checker
vvolkl 0e20b2e
Merge branch 'HSF:main' into Ganga2025
egede cc0fd27
Fix typos and implement feedback
egede File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: "Imperial College London" | ||
author: "Enric Tejedor" | ||
layout: default | ||
organization: ImperialCollege | ||
logo: Imperial-College-London2.png | ||
description: | | ||
[Imperial College London](https://www.imperial.ac.uk/) is a world top ten university with an international reputation for excellence in teaching and research. Consistently rated amongst the world's best universities, Imperial is committed to developing the next generation of researchers, scientists and academics through collaboration across disciplines. Located in the heart of London, Imperial is a multidisciplinary space for education, research, translation and commercialisation, harnessing science and innovation to tackle global challenges. | ||
--- | ||
|
||
{% include gsoc_proposal.ext %} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: "Monash University" | ||
author: "Ulrik Egede" | ||
layout: default | ||
organization: MonashUniversity | ||
logo: Monash.png | ||
description: | | ||
[Monash University](https://www.monash.edu/science/schools/physics) Monash University is one of Australia's leading universities and ranks among the world's top 100. We help change lives through research and education. It has a large faculty of Science that is active across all areas of Science from Particle Physics to the development of new methods for identifying rare earth minerals. | ||
--- | ||
|
||
{% include gsoc_proposal.ext %} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
project: Ganga | ||
layout: default | ||
logo: ganga_logo_150dpi.png | ||
description: | | ||
[Ganga](https://github.com/ganga-devs/ganga) is a computational task-management tool, which allows for the specification, submission, bookkeeping and post-processing of computational tasks on a wide set of distributed resources. | ||
Ganga has been developed to solve a problem increasingly common in scientific projects, which is that researchers must regularly switch between different processing systems, each with its own command set, to complete their computational tasks. Ganga provides a homogeneous environment for processing data on heterogeneous resources. | ||
--- | ||
|
||
{% include gsoc_project.ext %} | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
project: Ganga | ||
title: Incorporate a Large Language Model to assist users | ||
layout: gsoc_proposal | ||
year: 2025 | ||
difficulty: medium | ||
duration: 350 | ||
mentor_avail: May-November | ||
organization: | ||
- ImperialCollege | ||
- MonashUniversity | ||
--- | ||
|
||
## Description | ||
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end. | ||
|
||
As a scripting and command line interface, there will naturally be users that have problems with getting the syntax correct. To solve this, they will often spend time searching through mailing lists, FAQs and discussion fora or indeed just wait for another more advanced coder to debug their problem. The idea of this project is to integrate a Large Language Model (LLM) into the command prompt in Ganga. This should allow the user to describe in words what they would like to do and get an example that they can incorporate. It should also intercept exceptions thrown by the Ganga interface, help the user to understand them and propose solutions. | ||
|
||
We have an interface based on ollama that will build a RAG that contains extra information about Ganga that has not been available for the training of the underlying LLM. | ||
|
||
## Task ideas | ||
* Explore different options for integrating the LLM into a command line prompt. | ||
* Integrate the interaction with the LLM and RAG into Ganga. | ||
* Setup a server such that the LLM can run on a remote server requiring minimal installation by the user. | ||
* Test which samples are most useful for adding to the RAG (mailing list discussions, manuals, instant messages) | ||
* Develop continuous integration tests that can ensures that LLM integration will keep working. | ||
|
||
## Expected results | ||
For the scientific users of Ganga, this will speed up their development cycle as they will get a faster response to the usage queries that they have. | ||
|
||
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with how LLMs can be integrated directly into projects to assist users in the use of the CLI and in understanding error messages. | ||
|
||
## Evaluation Task | ||
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task. | ||
|
||
## Requirements | ||
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate) | ||
|
||
## Mentors | ||
* [Alex Richards](mailto:[email protected]) | ||
* [Mark Smith](mailto:[email protected]) | ||
* **[Ulrik Egede](mailto:[email protected])** | ||
|
||
## Links | ||
* [Ganga](https://github.com/ganga-devs/ganga) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
project: Ganga | ||
title: Incorporate a Large Language Model to assist users | ||
|
||
layout: gsoc_proposal | ||
year: 2025 | ||
difficulty: medium | ||
duration: 350 | ||
mentor_avail: May-November | ||
organization: | ||
- ImperialCollege | ||
- MonashUniversity | ||
--- | ||
|
||
## Description | ||
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end. | ||
|
||
As code that has developed over many years, there are part of the API that has become redundant. This means that for a period of time there will be both the old and now deprecated API as well as the new way of doing things. At the moment Ganga is missing a formal way of deprecating code. This means that warnings about using something deprecated are non-uniform and there is also very old code that has never been cleaned up. | ||
|
||
The idea in this project is to formalise the way that code can be declared deprecated and then use the continuous integration to ensure that the code eventually is deleted. | ||
|
||
## Task ideas | ||
* Have a well defined way of marking plugins, functions etc as deprecated with a warning about when they will be removed. Building on top | ||
of the python package [deprecated](https://pypi.org/project/Deprecated/) might be an idea. | ||
* Run tests in the testing framework that will alert developers to that certain parts of the code can now be removed. | ||
* Apply the deprecation system to parts of the code that is already deprecated. | ||
|
||
## Expected results | ||
Obtain a cleaner code base where very old and since long deprecated code is no longer present. Provide the end user with consistent warnings about their use of deprecated code as well as when it will be removed. | ||
|
||
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with working within a large code base that has gone through many developments. | ||
|
||
## Evaluation Task | ||
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task. | ||
|
||
## Requirements | ||
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate) | ||
|
||
## Mentors | ||
* [Alex Richards](mailto:[email protected]) | ||
* [Mark Smith](mailto:[email protected]) | ||
* **[Ulrik Egede](mailto:[email protected])** | ||
|
||
## Links | ||
* [Ganga](https://github.com/ganga-devs/ganga) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.