Skip to content

Commit 5d78866

Browse files
committed
gsoc25: add cvmfs proposal
1 parent 5119563 commit 5d78866

File tree

3 files changed

+72
-0
lines changed

3 files changed

+72
-0
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: CernVM-FS
3+
project: CernVM-FS
4+
layout: default
5+
logo: cernvmfs-logo.png
6+
description: |
7+
The CernVM-File System ([CVMFS](https://cernvm.cern.ch/fs/)) is a global, read-only POSIX file system that provides the universal namespace /cvmfs. It is based on content-addressable storage, Merkle trees, and HTTP data transport. CernVM-FS provides a mission critical infrastructure to small and large HEP collaborations.
8+
---
9+
10+
{% include gsoc_project.ext %}
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: Evaluate Distribution of ML model files on CVMFS
3+
layout: gsoc_proposal
4+
project: CernVM-FS
5+
year: 2025
6+
organization:
7+
- CERN
8+
difficulty: medium
9+
duration: 175
10+
mentor_avail: June-October
11+
---
12+
13+
# Description
14+
15+
Particle physicists studying nature at highest energy scales at the Large Hadron Collider rely on simulations and data processing for their experiments.
16+
These workloads run on the "computing grid", a massive globally distributed computing infrastructure.
17+
Deploying software efficiently and reliable to this grid is an important and challenging task.
18+
CVMFS is an optimised shared file system developed specifically for this purpose: It is implemented as a POSIX read-only file system in user space (a FUSE module).
19+
Files and directories are hosted on standard web servers and mounted in the universal namespace `/cvmfs`.
20+
In many cases, it replaces package managers and shared software areas on cluster file systems as means to distribute the software used to process experiment data.
21+
22+
## Task idea
23+
24+
CVMFS is optimized for the distribution of software (header files, scripts and libraries), taking advantage of the repeated access pattern for its caching, and the possibility to deduplicate files present in several versions.
25+
CVMFS is capable to provide a general read-only POSIX file system view on data in external storage. A very common usecache is to make conditions databases available to workloads running in distributed computing infrastructure, but various datasets have been published in CVMFS.
26+
How efficient CVMFS can be always depends on the details in these usecases - often the benefit for the users is simply in leveraging the existing server and proxy infrastructure.
27+
28+
29+
In this project proposal, we'd like to evaluate CVMFS as a means to distribute machine learning model files used in inference, for example .onnx files. The main focus will be on creating a test deployment and benchmarking the access, as well as possible coding utilities and scripts to aid in the deployment of models on CVMFS. We'd also like to contrast CVMFS to existing inference servers like KServe, and see if it could integrate as a backend storage.
30+
31+
32+
33+
34+
## Expected results and milestones
35+
36+
* Familiarization with the CVMFS server infrastructure
37+
* Familiarization with the ML model usage at CERN, Survey of different common inference model file formats.
38+
*
39+
* Test deployment of models relevant to ML4EP
40+
* Benchmark and evaluation of inference using models served from CVMFS
41+
* Addition of the benchmark to the CVMFS continuous benchmarking infrastructure
42+
* Writing a best practices document for the CVMFS documentation
43+
44+
45+
## Requirements
46+
47+
* UNIX/Linux
48+
* Interest in scientific computing devops
49+
* Familiarity with common ML libraries, in particular ONNX
50+
51+
52+
## Mentors
53+
54+
* **[Valentin Volkl](mailto:[email protected])**
55+
* [Lorenzo Moneta](mailto:[email protected])
56+
57+
58+
## Links
59+
60+
* [CVMFS](https://cernvm.cern.ch/fs/)
61+
* [KServe](https://kserve.github.io/website)

gsoc/2025/mentors.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ layout: plain
1717
* Stephan Lachnit [[email protected]](mailto:[email protected]) DESY
1818
* David Lange [[email protected]](mailto:[email protected]) CompRes
1919
* Serguei Linev [[email protected]](mailto:[email protected]) GSI
20+
* Lorenzo Moneta [[email protected]](mailto:[email protected]) CERN
2021
* Giacomo Parolini [[email protected]](mailto:[email protected]) CERN
2122
* Alexander Penev [[email protected]](mailto:[email protected]) CompRes/University of Plovdiv, BG
2223
* Mayank Sharma [[email protected]](mailto:[email protected]) UMich

0 commit comments

Comments
 (0)