From 632e480ac88e418bfc34400cb8cba6a5a5442e80 Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:03:38 +0200 Subject: [PATCH 1/9] small change. --- docs/platforms/mlp/index.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index d99d5c51..abafbe71 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -58,7 +58,9 @@ Scratch is per user - each user gets separate scratch path and quota. **Scratch is not intended for permanent storage**: transfer files back to the capstor project storage after job runs. !!! note - There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`, however this is not recommended for ML workloads for performance reasons. + There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`. + This filesystem should perform better for contiguous reads and writes. + Therefore, we recommend using this filesystem for storing checkpoint files generated by your training runs. ### Project From db0070fde92c2ec86551b3689d05672da7ef73a7 Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:05:50 +0200 Subject: [PATCH 2/9] wording. --- docs/platforms/mlp/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index abafbe71..d793f06a 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -60,7 +60,7 @@ Scratch is per user - each user gets separate scratch path and quota. !!! note There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`. This filesystem should perform better for contiguous reads and writes. - Therefore, we recommend using this filesystem for storing checkpoint files generated by your training runs. + Therefore, we recommend using capstor for storing checkpoint files generated by your training runs. ### Project From 756409f452844d6bd1d999aa5efdf2120245c48b Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:15:56 +0200 Subject: [PATCH 3/9] better explanation --- docs/platforms/mlp/index.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index d793f06a..9ac6b739 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -51,16 +51,29 @@ Use scratch to store datasets that will be accessed by jobs, and for job output. Scratch is per user - each user gets separate scratch path and quota. * The environment variable `SCRATCH=/iopsstor/scratch/cscs/$USER` is set automatically when you log into the system, and can be used as a shortcut to access scratch. +* There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`. !!! warning "scratch cleanup policy" Files that have not been accessed in 30 days are automatically deleted. **Scratch is not intended for permanent storage**: transfer files back to the capstor project storage after job runs. -!!! note - There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`. - This filesystem should perform better for contiguous reads and writes. - Therefore, we recommend using capstor for storing checkpoint files generated by your training runs. +!!! note "file system suitability" + The Capstor scratch filesystem is based on HDDs and is optimized for large, sequential read and write operations. + We recommend using Capstor for storing **checkpoint files** and other **large, contiguous outputs** generated by your training runs. + In contrast, Iopstor uses high-performance NVMe drives, which excel at handling **IOPS-intensive workloads** involving frequent, random access. This makes it a better choice for storing **training datasets**, especially when accessed randomly during machine learning training. + +### Scratch Usage Recommendations + +Use Iopstor scratch (`$SCRATCH`) for: + * Training and validation datasets that are read frequently and non-sequentially. + * Workloads that perform many small, random I/O operations. + +Use Capstor scratch (`/capstor/scratch/cscs/$USER`) for: + * Storing model checkpoints. + * Outputs from simulations or training jobs that involve large, contiguous I/O. + +After your job completes, remember to transfer any important results to your permanent project storage. ### Project From 1045448433bc534d4e2a47ee8229fc5c2899a233 Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:17:38 +0200 Subject: [PATCH 4/9] formatting. --- docs/platforms/mlp/index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index 9ac6b739..685634f8 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -66,12 +66,12 @@ Scratch is per user - each user gets separate scratch path and quota. ### Scratch Usage Recommendations Use Iopstor scratch (`$SCRATCH`) for: - * Training and validation datasets that are read frequently and non-sequentially. - * Workloads that perform many small, random I/O operations. +* Training and validation datasets that are read frequently and non-sequentially. +* Workloads that perform many small, random I/O operations. Use Capstor scratch (`/capstor/scratch/cscs/$USER`) for: - * Storing model checkpoints. - * Outputs from simulations or training jobs that involve large, contiguous I/O. +* Storing model checkpoints. +* Outputs from simulations or training jobs that involve large, contiguous I/O. After your job completes, remember to transfer any important results to your permanent project storage. From da7d5727035cd4e3baf3fc59f5b66f3759bbfc9d Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:21:12 +0200 Subject: [PATCH 5/9] formatting --- docs/platforms/mlp/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index 685634f8..0503ecb9 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -66,10 +66,12 @@ Scratch is per user - each user gets separate scratch path and quota. ### Scratch Usage Recommendations Use Iopstor scratch (`$SCRATCH`) for: + * Training and validation datasets that are read frequently and non-sequentially. * Workloads that perform many small, random I/O operations. Use Capstor scratch (`/capstor/scratch/cscs/$USER`) for: + * Storing model checkpoints. * Outputs from simulations or training jobs that involve large, contiguous I/O. From fbf51b3afe906c8c298b8c7ecf14f36713d80a1f Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:25:40 +0200 Subject: [PATCH 6/9] try expanding filesystem table... --- docs/platforms/mlp/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index 0503ecb9..36db4c87 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -37,6 +37,7 @@ There are three main file systems mounted on the MLP clusters Clariden and Brist | -- | -- | -- | | Home | /users/$USER | [VAST][ref-alps-vast] | | Scratch | `/iopsstor/scratch/cscs/$USER` | [Iopsstor][ref-alps-iopsstor] | +| | `/iopsstor/scratch/cscs/$USER` | [Capstor][ref-alps-iopsstor] | | Project | `/capstor/store/cscs/swissai/` | [Capstor][ref-alps-capstor] | ### Home From 679cd72cb6e1d98e7f7a8e3f37ca6cc9a9a4b4bc Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:26:05 +0200 Subject: [PATCH 7/9] typo. --- docs/platforms/mlp/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index 36db4c87..ccda2101 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -35,7 +35,7 @@ There are three main file systems mounted on the MLP clusters Clariden and Brist | type |mount | filesystem | | -- | -- | -- | -| Home | /users/$USER | [VAST][ref-alps-vast] | +| Home | `/users/$USER` | [VAST][ref-alps-vast] | | Scratch | `/iopsstor/scratch/cscs/$USER` | [Iopsstor][ref-alps-iopsstor] | | | `/iopsstor/scratch/cscs/$USER` | [Capstor][ref-alps-iopsstor] | | Project | `/capstor/store/cscs/swissai/` | [Capstor][ref-alps-capstor] | From 4a0dc099c91778cc11ced8257c86991e509332e9 Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:26:20 +0200 Subject: [PATCH 8/9] typo. --- docs/platforms/mlp/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index ccda2101..9b0b0e5a 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -37,7 +37,7 @@ There are three main file systems mounted on the MLP clusters Clariden and Brist | -- | -- | -- | | Home | `/users/$USER` | [VAST][ref-alps-vast] | | Scratch | `/iopsstor/scratch/cscs/$USER` | [Iopsstor][ref-alps-iopsstor] | -| | `/iopsstor/scratch/cscs/$USER` | [Capstor][ref-alps-iopsstor] | +| | `/capstor/scratch/cscs/$USER` | [Capstor][ref-alps-iopsstor] | | Project | `/capstor/store/cscs/swissai/` | [Capstor][ref-alps-capstor] | ### Home From 1578a3c607d32fe43024bbbf39552e5c834d5985 Mon Sep 17 00:00:00 2001 From: "Nick J. Browning" Date: Fri, 6 Jun 2025 10:26:31 +0200 Subject: [PATCH 9/9] another typo. --- docs/platforms/mlp/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/platforms/mlp/index.md b/docs/platforms/mlp/index.md index 9b0b0e5a..e5ab58eb 100644 --- a/docs/platforms/mlp/index.md +++ b/docs/platforms/mlp/index.md @@ -37,7 +37,7 @@ There are three main file systems mounted on the MLP clusters Clariden and Brist | -- | -- | -- | | Home | `/users/$USER` | [VAST][ref-alps-vast] | | Scratch | `/iopsstor/scratch/cscs/$USER` | [Iopsstor][ref-alps-iopsstor] | -| | `/capstor/scratch/cscs/$USER` | [Capstor][ref-alps-iopsstor] | +| | `/capstor/scratch/cscs/$USER` | [Capstor][ref-alps-capstor] | | Project | `/capstor/store/cscs/swissai/` | [Capstor][ref-alps-capstor] | ### Home