diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index d99d5c51..e5ab58eb 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -35,8 +35,9 @@ There are three main file systems mounted on the MLP clusters Clariden and Brist | type |mount | filesystem | | -- | -- | -- | -| Home | /users/$USER | [VAST][ref-alps-vast] | +| Home | `/users/$USER` | [VAST][ref-alps-vast] | | Scratch | `/iopsstor/scratch/cscs/$USER` | [Iopsstor][ref-alps-iopsstor] | +| | `/capstor/scratch/cscs/$USER` | [Capstor][ref-alps-capstor] | | Project | `/capstor/store/cscs/swissai/` | [Capstor][ref-alps-capstor] | ### Home @@ -51,14 +52,31 @@ Use scratch to store datasets that will be accessed by jobs, and for job output. Scratch is per user - each user gets separate scratch path and quota. * The environment variable `SCRATCH=/iopsstor/scratch/cscs/$USER` is set automatically when you log into the system, and can be used as a shortcut to access scratch. +* There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`. !!! warning "scratch cleanup policy" Files that have not been accessed in 30 days are automatically deleted. **Scratch is not intended for permanent storage**: transfer files back to the capstor project storage after job runs. -!!! note - There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`, however this is not recommended for ML workloads for performance reasons. +!!! note "file system suitability" + The Capstor scratch filesystem is based on HDDs and is optimized for large, sequential read and write operations. + We recommend using Capstor for storing **checkpoint files** and other **large, contiguous outputs** generated by your training runs. + In contrast, Iopstor uses high-performance NVMe drives, which excel at handling **IOPS-intensive workloads** involving frequent, random access. This makes it a better choice for storing **training datasets**, especially when accessed randomly during machine learning training. + +### Scratch Usage Recommendations + +Use Iopstor scratch (`$SCRATCH`) for: + +* Training and validation datasets that are read frequently and non-sequentially. +* Workloads that perform many small, random I/O operations. + +Use Capstor scratch (`/capstor/scratch/cscs/$USER`) for: + +* Storing model checkpoints. +* Outputs from simulations or training jobs that involve large, contiguous I/O. + +After your job completes, remember to transfer any important results to your permanent project storage. ### Project