-
Notifications
You must be signed in to change notification settings - Fork 351
gsoc25: add cvmfs proposal #1670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
5d78866
gsoc25: add cvmfs proposal
vvolkl 59151bd
Update _gsocproposals/2025/proposal_CVMFS_DistributeModelFiles.md
vvolkl cd84675
Merge branch 'main' into gsoc25-cvmfs
vvolkl a77ffad
Merge branch 'main' into gsoc25-cvmfs
vvolkl 1b5f61a
Apply suggestions from code review
vvolkl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: CernVM-FS | ||
project: CernVM-FS | ||
layout: default | ||
logo: cernvmfs-logo.png | ||
description: | | ||
The CernVM-File System ([CVMFS](https://cernvm.cern.ch/fs/)) is a global, read-only POSIX file system that provides the universal namespace /cvmfs. It is based on content-addressable storage, Merkle trees, and HTTP data transport. CernVM-FS provides a mission critical infrastructure to small and large HEP collaborations. | ||
--- | ||
|
||
{% include gsoc_project.ext %} |
61 changes: 61 additions & 0 deletions
61
_gsocproposals/2025/proposal_CVMFS_DistributeModelFiles.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
--- | ||
title: Evaluate Distribution of ML model files on CVMFS | ||
layout: gsoc_proposal | ||
project: CernVM-FS | ||
year: 2025 | ||
organization: | ||
- CERN | ||
difficulty: medium | ||
duration: 175 | ||
mentor_avail: June-October | ||
--- | ||
|
||
# Description | ||
|
||
Particle physicists studying nature at highest energy scales at the Large Hadron Collider rely on simulations and data processing for their experiments. | ||
These workloads run on the "computing grid", a massive globally distributed computing infrastructure. | ||
Deploying software efficiently and reliable to this grid is an important and challenging task. | ||
CVMFS is an optimised shared file system developed specifically for this purpose: It is implemented as a POSIX read-only file system in user space (a FUSE module). | ||
Files and directories are hosted on standard web servers and mounted in the universal namespace `/cvmfs`. | ||
In many cases, it replaces package managers and shared software areas on cluster file systems as means to distribute the software used to process experiment data. | ||
|
||
## Task idea | ||
|
||
CVMFS is optimized for the distribution of software (header files, scripts and libraries), taking advantage of the repeated access pattern for its caching, and the possibility to deduplicate files present in several versions. | ||
CVMFS is capable to provide a general read-only POSIX file system view on data in external storage. A very common usecase is to make conditions databases available to workloads running in distributed computing infrastructure, but various datasets have been published in CVMFS. | ||
vvolkl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
How efficient CVMFS can be always depends on the details in these usecases - often the benefit for the users is simply in leveraging the existing server and proxy infrastructure. | ||
vvolkl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
|
||
In this project proposal, we'd like to evaluate CVMFS as a means to distribute machine learning model files used in inference, for example .onnx files. The main focus will be on creating a test deployment and benchmarking the access, as well as possible coding utilities and scripts to aid in the deployment of models on CVMFS. We'd also like to contrast CVMFS to existing inference servers like KServe, and see if it could integrate as a backend storage. | ||
|
||
|
||
|
||
|
||
## Expected results and milestones | ||
|
||
* Familiarization with the CVMFS server infrastructure | ||
* Familiarization with the ML model usage at CERN, Survey of different common inference model file formats. | ||
* | ||
* Test deployment of models relevant to ML4EP | ||
* Benchmark and evaluation of inference using models served from CVMFS | ||
* Addition of the benchmark to the CVMFS continuous benchmarking infrastructure | ||
* Writing a best practices document for the CVMFS documentation | ||
|
||
|
||
## Requirements | ||
|
||
* UNIX/Linux | ||
* Interest in scientific computing devops | ||
* Familiarity with common ML libraries, in particular ONNX | ||
|
||
|
||
## Mentors | ||
|
||
* **[Valentin Volkl](mailto:[email protected])** | ||
* [Lorenzo Moneta](mailto:[email protected]) | ||
|
||
|
||
## Links | ||
|
||
* [CVMFS](https://cernvm.cern.ch/fs/) | ||
* [KServe](https://kserve.github.io/website) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ layout: plain | |
* David Lange [[email protected]](mailto:[email protected]) CompRes | ||
* Serguei Linev [[email protected]](mailto:[email protected]) GSI | ||
* Peter McKeown [[email protected]](mailto:[email protected]) CERN | ||
* Lorenzo Moneta [[email protected]](mailto:[email protected]) CERN | ||
* Felice Pantaleo [[email protected]](mailto:[email protected]) CERN | ||
* Giacomo Parolini [[email protected]](mailto:[email protected]) CERN | ||
* Alexander Penev [[email protected]](mailto:[email protected]) CompRes/University of Plovdiv, BG | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.