globus · sirosen · Apr 18, 2025 · Apr 14, 2025 · Apr 18, 2025 · ada-globus
@@ -0,0 +1,27 @@
+# configuration for conversion to docs.globus.org
+title: 'Tar and Transfer Files with Compute'
+short_description: |
+    Use Globus Compute to bundle files into a tarball, which you then transfer
+    using Globus Transfer.
+
+    Two examples are included here, one in which the files are located on the
+    server which runs Globus Compute, and one in which the files are on a user's
+    machine and must be moved to the Compute host.
+
+example_dir: 'compute_tar_and_transfer'
+append_source_blocks: false
+index_source:
+  concat:
+    files:
+      - 'README.adoc'
+      - 'register_function.adoc'
+      - 'example_flow1.adoc'
+      - 'example_flow2.adoc'
+include_files:
+  - 'compute_transfer_example_1_definition.json'
+  - 'compute_transfer_example_1_schema.json'
+  - 'compute_transfer_example_2_definition.json'
+  - 'compute_transfer_example_2_schema.json'
+  - 'register_compute_function.py'
+
+menu_weight: 400
@@ -0,0 +1,48 @@
+= Tar and Transfer with Globus Compute
+
+These examples demonstrate how to build **flow**s that combine Globus Compute and Globus Transfer to process and move data.
+
+Each of these examples creates an archive file from the user's files and transfers that archive to a destination.
+In one case the source data is already on the server running Globus Connect Server and Globus Compute, and in the other it is on a source **collection** owned by the end user.
+
+== Prerequisites
+
+To run these examples, you must have a properly configured server and some local software installed.
+
+You must have a co-located Globus Connect Server Collection and Globus Compute **endpoint**, either hosted on the same server or at least with access to a shared filesystem.
+
+Globus Connect Server Collection::
++
+You can follow
+link:https://docs.globus.org/globus-connect-server/v5.4/[this guide for setting up a Globus Connect Server Collection]
+to install Globus Connect Server and configure a **collection**.
++
+For ease of use, we recommend using a Guest Collection.
+
+Globus Compute Endpoint::
++
+link:https://globus-compute.readthedocs.io/en/latest/endpoints/installation.html[This guide for setting up a Globus Compute Endpoint]
+covers installation of the Globus Compute software.
++
+This Compute **endpoint** must have read/write permissions on the same storage location where the Globus Connect Server **ollection** is hosted.
+
+Globus CLI::
++
+You will also need the Globus CLI installed (link:https://docs.globus.org/cli/#installation[CLI installation docs]).
++
+Globus CLI documentation recommends installation with `pipx`, as in `pipx install globus-cli`.
+
+Globus Compute SDK::
++
+You must have the `globus-compute-sdk` Python package available.
+We strongly recommend using a virtual environment for this installation; installing with `pip install globus-compute-sdk`.
++
+You can follow
+link:https://globus-compute.readthedocs.io/en/stable/quickstart.html#installation[the Globus Compute install documentation]
+to install the Compute SDK client package in a virtualenv.
+
+ifdef::env-github[]
+== Next: Learn About the `do_tar` Compute **Function**
+
+link:./register_function.adoc[Register the `do_tar` Compute **Function**.]
+endif::[]
@@ -0,0 +1,102 @@
+== Example Flow 1
+
+In this first example, the Compute and Transfer **flow** takes a user-provided list of source files that already exist in the **collection**.
+
+The **flow** creates a tarfile from those files and transfers the tarfile to a user-provided destination collection.
+
+The **flow** will:
+
+1. Set constants for the **run**
+2. Create an output directory named after the **run**'s ID on the GCS collection
+3. Invoke the `do_tar` **function** to create a tar archive from the input source files and save it in the output directory
+4. Transfer the resulting tarfile to the destination collection provided in the **flow** input
+5. Delete the output directory
+
+=== Create the **Flow**
+
+1. Edit `compute_transfer_example_1_definition.json` and replace the placeholder values:
+
+   - `gcs_endpoint_id`: The **collection** ID
+   - `compute_endpoint_id`: The Compute **endpoint** ID
+   - `compute_function_id`: The UUID of the registered `do_tar` **function**
+
+If the **collection** has a configured base path, also edit `gcs_base_path`.
+
+2. Create the **flow**:
++
+[source,bash,role=clippable-code]
+----
+globus flows create "Compute and Transfer Flow Example 1" \
+   ./compute_transfer_example_1_definition.json \
+   --input-schema ./compute_transfer_example_1_schema.json
+----
+
+3. Save the **flow** ID returned by this command
+
+ifndef::env-github[]
+[.accordionize]
+--
+.compute_transfer_example_1_definition.json
+[%collapsible]
+====
+[source,json,role=clippable-code]
+----
+include::compute_transfer_example_1_definition.json[]
+----
+====
+.compute_transfer_example_1_schema.json
+[%collapsible]
+====
+[source,json,role=clippable-code]
+----
+include::compute_transfer_example_1_schema.json[]
+----
+====
+--
+endif::[]
+
+=== Run the **Flow**
+
+1. Create the **flow** input JSON file:
++
+[source,json,role=clippable-code]
+----
+{
+    "source_paths": ["/path/to/file1", "/path/to/file2"],
+    "destination_path": "/path/to/your/destination/file.tar.gz",
+    "destination_endpoint_id": "your-destination-endpoint-uuid"
+}
+----
+
+2. Start the **flow**:
++
+[source,bash,role=clippable-code]
+----
+globus flows start "$FLOW_ID" \
+   --input "<FLOW INPUT FILE>" \
+   --label "Compute and Transfer Flow Example 1 Run"
+----
++
+And save the **run** ID for use in the next command.
+
+3. Monitor the **run** progress:
++
+[source,bash,role=clippable-code]
+----
+globus flows run show "<RUN_ID>"
+----
+** At this point, the **run** _may_ become `INACTIVE`, depending on the type of **collection** being used.
+** For inactive **run**s due to data access requirements, this can be resolved by resuming the **run** and following the prompts:
++
+[source,bash,role=clippable-code]
+----
+globus flows run resume "<RUN_ID>"
+----
++
+When prompted, run `globus session consent` and rerun `globus flows run resume` to resume the **run**.
+
+ifdef::env-github[]
+== Next: Example Flow 2, with Data on a Separate **Collection**
+
+link:./example_flow2.adoc[Example Flow 2.]
+endif::[]