|
| 1 | +--- |
| 2 | +project: HSF CernVM-FS |
| 3 | +title: CernVM-FS - Integration of FUSE-T library for macOS laptop clients. Levaraging modern overlay FS features in cvmfs_server |
| 4 | +author: Yuriy Belikov |
| 5 | +date: 25.09.2024 |
| 6 | +year: 2024 |
| 7 | +layout: blog_post |
| 8 | +logo: |
| 9 | +intro: | |
| 10 | + Worked on integration of additional overlay FS features in CVMFS server part and replacement of macFUSE kernel extension with FUSE-T user-space library for macOS laptop clients. |
| 11 | + At this point a backbone for metadata-only copying and zero-copy directory renames is implemented for Linux server part and still goes through a refinement and adjustment |
| 12 | + due to various differences in overlay FS behaviour discovered during the course of the project. FUSE-T support for macOS clients is primarily done apart several issues that we |
| 13 | + classified as critical for CVMFS client stability. I helped with creation of GitHub Actions CI pipeline for macOS clients, prepared a table of FUSE-T issues that were encountered |
| 14 | + during this project. Currently, I am continuing my work on adding an ability for users to switch back to macFUSE kext. |
| 15 | +--- |
| 16 | + |
| 17 | +# Introduction |
| 18 | +When the accepted projects were announced, I already started my participation in CVMFS project under Ukrainian Remote Student Internship and was already working |
| 19 | +on integration of capabilities provided by metadata-only copying and zero-copy directory renames in overlay FS at the core of cvmfs_server utility. |
| 20 | +However, we outlined replacement of macFUSE kernel extension with FUSE-T user-space library as the main goal for GSoC. The motivation behind this is a potential |
| 21 | +simplification of macOS client installation process: |
| 22 | +- Currently, installing third-party kernel extensions require double reboot |
| 23 | +- Manual degrading of macOS security protection mechanisms |
| 24 | + |
| 25 | +On top of that, FUSE-T potentially creates a foundation for making a brew package for CVMFS client. Additionally it allows creation of GitHub Actions CI pipeline for macOS client. |
| 26 | +However, I proposed continuation of my work with overlay FS updates during the period of my participation in GSoC, what was also welcomed. |
| 27 | +Since I had already been in touch with Valentin as with my internship supervisor and other members of CVMFS team there were no need in Community Bonding Period. |
| 28 | +Thus, I started working on these tasks. |
| 29 | + |
| 30 | +# Goals and objectives for FUSE-T project |
| 31 | +- Determine key components of FUSE-T installation: headers, libraries, any helper binaries |
| 32 | +- Update CMake build files accordingly (later we decided to keep backward compatibility with macFUSE kext) |
| 33 | +- Update bash scripts that perform various commands on behalf of macOS client accordingly |
| 34 | +- Update parts that test the presence of macOS fuse-related libraries |
| 35 | +- Change integration tests if necessary to comply with FUSE-T |
| 36 | + |
| 37 | +# Motivation behind cmvfs_server modification |
| 38 | +On **cvmfs_server publish** command, the utility traverses scratch area and stores information about the contents of repository in file catalogs and the content itself in a compressed manner. |
| 39 | +Since compression, hashing and updating file catalogs is performed for full copies of the modified files modern overlay FS features have a potential to improve performance of cvmfs_server transactions via avoiding the unnecessary data copying to scratch area. |
| 40 | + |
| 41 | +However, integrating zero-copy directory renames appeared not as trivial as was firstly expected: |
| 42 | +As initial logic of cvmfs transactions relies on the fact that scratch area contains full copies of file system objects zero-copy rename leads to wiping out old directory with all its contents |
| 43 | +Subdirectories removal creates a different footprint in the upper-layer directory: a whiteout appears instead of the removed subdirectory which is not the case for a usual setup |
| 44 | + |
| 45 | +# Goals and objectives for overlay FS project |
| 46 | +- Enable metacopy and redirect_dir features for CVMFS repositories mounting |
| 47 | +- Update scratch area traversal routine accordingly |
| 48 | +- Implement catalog entries renaming (avoid remove + add sequence) |
| 49 | +- Implement metadata-only update for catalog entries |
| 50 | +- Expand integration tests with new cases that cover various renaming scenarios |
| 51 | +- Cover new functionality under a separate option that would make it possible to enable/disable this feature |
| 52 | + |
| 53 | +# What was done for overlay FS project: |
| 54 | +- Upgrades for scratch area traversal: made it 3-steps currently for proper separation of whiteouts left after a removal and after renaming. |
| 55 | +- Upgrades for catalog manager: added object that manage new SQL queries. |
| 56 | +- Changes in FS synchronization mechanism to properly handle nested whiteouts. |
| 57 | +- New integration test that deal with directories renamings and various related cases: partial modification of a nested content, removal of nested files and directories. |
| 58 | +- Tracking the files created with metadata-only xattr. |
| 59 | +- Overlay FS documentation got [a tiny patch](https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/Documentation/filesystems/overlayfs.rst?id=930b7c32ea2b514fb2c37aa3d4b946d954ee7fa2) clarifying metacopy section. |
| 60 | + |
| 61 | +# What was done for FUSE-T project |
| 62 | +- CMake update to overcome issues with dyld failing to find dynamic libraries. |
| 63 | +- Updated CMake build files for FUSE-T usage. |
| 64 | +- Updated FUSE-T installation check. |
| 65 | +- GitHub Actions pipeline with updated macOS CI support (introduced small fixes for GitHub Actions setup created by Valentin). |
| 66 | +- Achieved integration tests passing on a reduced tests set. |
| 67 | + |
| 68 | +# Questions left for overlay FS (and yet to be resolved) |
| 69 | +- Benchmarking: since the changes required to make replace one-step scratch area traversal with the three-step (could it be reduced to two-step?). |
| 70 | +- How to separate new files from updated files in renamed directories (in principle such entries are absent in readonly layer). |
| 71 | +- Is it possible to update only file content hash instead of the whole entry in a catalog DB table? |
| 72 | +- Nested catalogs support. |
| 73 | + |
| 74 | +# Encountered issues with FUSE-T (yet to be resolved) |
| 75 | +- FUSE-T invokes listxattr before calling getxattr: |
| 76 | +- If your filesystem doesn’t expect this you are in trouble: in our case magic attributes doesn’t work properly. |
| 77 | +- Hidden extended attributes are not supported; not a big issue since they are utilized by cvmfs_server (which is not supported on macOS) |
| 78 | +- Mounting takes a time frame (usually a few seconds) long enough to fail immediate subsequent commands (such as calling **ls \<repo\>** right after mount) |
| 79 | +- Sometimes we get directory “loops” inside directories where usually regular files are stored: nested directory refers a parent directory |
| 80 | + |
| 81 | +# Conclusions |
| 82 | +That was quite an interesting journey that led to conducting two talks: one during [CERN-SFT group meeting](https://indico.cern.ch/event/1402909/) where I presented my preliminary benchmarks |
| 83 | +made for overlay FS redirect_dir and metacopy options on a toy setup; the other one was during [CERN-VM FS Workshop 2024](https://indico.cern.ch/event/1347727/timetable/) where I presented the detailed |
| 84 | +outcomes of my GSoC project in more detailed way. |
| 85 | +I am grateful for this opportunity to collaborate on this project and contribute in such a huge organisation as CERN. I want to thank my supervisor |
| 86 | +Valentin Volkl for his guidance and support all the way through this period. |
| 87 | +There is still work to do: FUSE-T integration appeared not as easy as we were expecting as it brings some peculiar issues |
| 88 | +on repositories reloads and IPC management that are yet to be resolved. |
| 89 | +As well overlay FS part contained various not-so-trivial parts such as nested whiteouts and partial content updates for directories. |
| 90 | +I hope I will help with resolution of the remaining issues and my work will be good foundation for the future release of CVMFS. |
| 91 | + |
| 92 | +**Related repositories:** [CVMFS fork](https://github.com/YBelikov/cvmfs), [CVMFS origin](https://github.com/cvmfs/cvmfs) |
| 93 | + |
| 94 | +**Check out my work here:** [FUSE-T](https://github.com/cvmfs/cvmfs/pull/3587), [Overlay FS](https://github.com/cvmfs/cvmfs/pull/3547) |
0 commit comments