A full documentation for stashing (#6936)

khsrali · web-flow · commit 81dd4df5d655 · 2025-07-18T11:37:00.000+02:00
Following commit b2a6e2,
This commit updates documentation on stashing calcjob.
diff --git a/docs/source/topics/calculations/usage.rst b/docs/source/topics/calculations/usage.rst
@@ -603,15 +603,67 @@ The order can be controlled through the ``file_copy_operation_order`` attribute
 
 .. _topics:calculations:usage:calcjobs:stashing:
 
-Stashing on the remote
-~~~~~~~~~~~~~~~~~~~~~~
 
-The ``stash`` option namespace allows a user to specify certain files and/or folders that are created by the calculation job to be stashed somewhere on the remote where the job is run.
-This can be useful if these need to be stored for a longer time on a machine where the scratch space is cleaned regularly, but they need to be kept on the remote machine and not retrieved.
-Examples are files that are necessary to restart a calculation but are too big to be retrieved and stored permanently in the local file repository.
+Stashing Files on the Remote
+----------------------------
+
+
+In many scientific workflows, calculations produce files that are either too large to retrieve to your local AiiDA repository or simply not needed locally. However, you may still want to keep these files available on the remote machine—for example, to facilitate restarts, enable debugging, or for archiving purposes—but outside the compute or scratch directory that might be cleaned up regularly.
+
+AiiDA offers a stashing mechanism to help with this: it can automatically copy or archive specified files to a persistent location on the remote computer, either immediately after the calculation completes or as a separate follow-up calcjob.
+
+Below, we briefly describe the two supported methods for remote stashing and provide guidance on how to choose the best approach for your use case.
+
+Which method should I use?
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. list-table::
+   :header-rows: 1
+
+   * - Scenario
+     - Recommended method
+   * - Stash files regardless of calculation outcome (even if failed)
+     - Method 1: Stashing **Immediately After Job Completion on HPC**
+   * - Stash files from an already completed calculation
+     - Method 2: Stashing via a **Separate Calculation Job**
+   * - I want to submit my own custom script for stashing
+     - Method 2: Stashing via a **Separate Calculation Job**
+
+Quick comparison between these methods:
+
+::
+
+   (Method 1) Immediate stashing:
+   +---------------------+      +--------------------------------+
+   |  Calculation job    | ---> | Stash files with no submission |
+   +---------------------+      |         (before retrieve)      |
+                                +--------------------------------+
+                                             |
+                                             v
+                                +------------------------+
+                                | Retrieve & parse files |
+                                +------------------------+
+
+   (Method 2) Post-completion stashing:
+   +---------------------+      +------------------------+
+   |  Calculation job    | ---> | Retrieve & parse files |  ->
+   +---------------------+      +------------------------+
+
+   +---------------------+      +---------------------------------+
+   |  StashCalculation   | ---> |  Stash files with no submission |
+   +---------------------+      |         or                      |
+                                | Submit as a custom script       |
+                                +---------------------------------+
+
 
-The files/folder that need to be stashed are specified through their relative filepaths within the working directory in the ``stash.source_list`` option.
-Using the ``COPY`` mode, the target path defines another location (on the same filesystem as the calculation) to copy the files to, and is set through the ``stash.target_base`` option, for example:
+Method 1: Stashing Immediately After Job Completion on HPC
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This approach performs stashing as soon as the calculation finishes, but **before** any files are retrieved or parsed. It is available for stash modes: ``COPY``, ``COMPRESS_TAR``, ``COMPRESS_TARBZ2``, ``COMPRESS_TARGZ``, and ``COMPRESS_TARXZ``.
+
+**Typical use case:** You need to preserve output files from all runs, even failed ones, for debugging or restarting purposes.
+
+Specify which files or folders to stash (by relative paths) using the ``stash.source_list`` option, and the destination on the remote using ``stash.target_base``. Example:
 
 .. code-block:: python
 
@@ -623,28 +675,28 @@ Using the ``COPY`` mode, the target path defines another location (on the same f
        'metadata': {
            'options': {
                'stash': {
-                   'source_list': ['aiida.out', 'output.txt'],
-                   'target_base': '/storage/project/stash_folder',
                    'stash_mode': StashMode.COPY.value,
+                   'target_base': '/storage/project/stash_folder',
+                   'source_list': ['aiida.out', 'output.txt'],
                }
            }
        }
    }
 
-.. note::
-    In addition to the ``COPY`` mode, the following modes, these storage efficient modes are also are available:
-    ``COMPRESS_TAR``, ``COMPRESS_TARBZ2``, ``COMPRESS_TARGZ``, ``COMPRESS_TARXZ``.
-
-The stashed files and folders are represented by an output node that is attached to the calculation node through the label ``remote_stash``, as a ``RemoteStashFolderData`` node.
-Just like the ``remote_folder`` node, this represents a location or files on a remote machine and so is equivalent to a "symbolic link".
+The stashed files are represented by an output node with the label ``remote_stash`` (an instance of ``RemoteStashFolderData``), attached to the calculation node. This node acts like a "symbolic link" pointing to the location on the remote system.
 
 .. important::
 
-   If the ``stash`` option namespace is defined for a generic calculation job, the daemon will perform the stashing operations before the files are retrieved.
-   This means that the stashing happens before the parsing of the output files (which occurs after the retrieving step), such that that the files will be stashed independent of the final exit status that the parser will assign to the calculation job.
-   This may cause files to be stashed for calculations that will later be considered to have failed.
+   The stashing operation occurs *before* any file retrieval or parsing. As a result, files may be stashed even for calculations that later turn out to have failed.
+
+Method 2: Stashing via a Separate Calculation Job
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This approach lets you stash files **only after a successful calculation**. This is done by running a follow-up `core.stash` calculation that copies or archives files from the remote folder of a finished calculation job.
 
-To avoid this scenario, you can instead, stash via a separate calculation job, for example:
+**Typical use case:** You want to avoid keeping files from failed calculations, or need to run custom post-processing scripts.
+
+This method requires specifying the ``remote_folder`` of the original calculation as ``source_node``. Example:
 
 .. code-block:: python
 
@@ -658,12 +710,12 @@ To avoid this scenario, you can instead, stash via a separate calculation job, f
 
     inputs = {
         'metadata': {
-            'computer': load_computer(label="localhost"),
+            'computer': load_computer(label=<COMPUTER_LABEL>),
             'options': {
                 'stash': {
-                'source_list': ['aiida.out', 'output.txt'],
-                'target_base': '/scratch/',
-                'stash_mode': StashMode.COPY.value,
+                    'stash_mode': StashMode.COPY.value,
+                    'target_base': '/scratch/',
+                    'source_list': ['aiida.out', 'output.txt'],
                 },
             },
         },
@@ -672,10 +724,88 @@ To avoid this scenario, you can instead, stash via a separate calculation job, f
 
     result = run(StashCalculation, **inputs)
 
+Custom script stashing (advanced)
+.................................
+
+You can run your own script as part of the stashing step, using the ``SUBMIT_CUSTOM_CODE`` stash mode.
+First, place your script on the remote machine and define it as an AiiDA code:
+
+.. code-block:: python
+
+   code = InstalledCode(
+       label='<MY_CODE>',
+       default_calc_job_plugin='core.stash',
+       computer=load_computer(<COMPUTER_LABEL>),
+       filepath_executable=str(<Path_to_script.sh>),
+   )
+   code.store()
+
+Run the custom stashing job with:
+
+.. code-block:: python
+
+   StashCalculation = CalculationFactory('core.stash')
+   inputs = {
+       'metadata': {
+           'computer': load_computer(<COMPUTER_LABEL>),
+           'options': {
+               'resources': {'num_machines': 1},
+               'stash': {
+                   'stash_mode': StashMode.SUBMIT_CUSTOM_CODE.value,
+                   'target_base': str(target_base),
+                   'source_list': ['aiida.out', 'output.txt'],
+               },
+           },
+       },
+       'source_node': <orm.RemoteData>,
+       'code': load_code(label='<MY_CODE>'),
+   }
+   submit(StashCalculation, **inputs)
+
+
+
+This calculation produces an ``aiida.in`` file in JSON format with the stashing parameters, for example:
+
+.. code-block:: none
+
+   {"working_directory": <orm.RemoteData>.get_remote_path(),
+    "source_list": ["aiida.out", "output.txt"],
+    "target_base": "/path/to/stash"}
+
+Which is used as an input to your script:
+
+::
+
+    ./script.sh < aiida.in > aiida.out
+
+Therefore, your script should parse the JSON, and implement the stashing by any means. For example:
+
+.. code-block:: bash
+
+   json=$(cat)
+   working_directory=$(echo "$json" | jq -r '.working_directory')
+   source_list=$(echo "$json" | jq -r '.source_list[]')
+   target_base=$(echo "$json" | jq -r '.target_base')
+
+   mkdir -p "$target_base"
+   for item in $source_list; do
+       cp "$working_directory/$item" "$target_base/"
+       echo "$working_directory/$item copied successfully."
+   done
+
+This way you can implement any custom logic in your script, such as tape commands, handling errors, or filtering files dynamically.
+
+Caveats and best practices
+""""""""""""""""""""""""""
 
 .. important::
 
-   AiiDA does not actually control the files in the remote stash, and so the contents may disappear at some point.
+   - **AiiDA does not manage the files in the remote stash after creation.** Files may be deleted or lost at any time, depending on the cluster's configuration or cleanup policies.
+   - **Check quotas and permissions**: Make sure you have write access and sufficient quota in the target stash directory.
+   - **Handle errors**: If the stashing operation fails (e.g., due to missing files or lack of permissions), AiiDA will log the issue, but will not raise. It is your responsibility to check and recover as needed.
+   - **Source files are not deleted after stashing**: This is to prevent unwanted data-loss.
+
+
 
 .. _topics:calculations:usage:calcjobs:options: