Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 0 additions & 171 deletions compute_transfer_examples/README.md

This file was deleted.

27 changes: 27 additions & 0 deletions compute_transfer_examples/tar_and_transfer/.doc_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# configuration for conversion to docs.globus.org
title: 'Tar and Transfer Files with Compute'
short_description: |
Use Globus Compute to bundle files into a tarball, which you then transfer
using Globus Transfer.

Two examples are included here, one in which the files are located on the
server which runs Globus Compute, and one in which the files are on a user's
machine and must be moved to the Compute host.

example_dir: 'compute_tar_and_transfer'
append_source_blocks: false
index_source:
concat:
files:
- 'README.adoc'
- 'register_function.adoc'
- 'example_flow1.adoc'
- 'example_flow2.adoc'
include_files:
- 'compute_transfer_example_1_definition.json'
- 'compute_transfer_example_1_schema.json'
- 'compute_transfer_example_2_definition.json'
- 'compute_transfer_example_2_schema.json'
- 'register_compute_function.py'

menu_weight: 400
48 changes: 48 additions & 0 deletions compute_transfer_examples/tar_and_transfer/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
= Tar and Transfer with Globus Compute

These examples demonstrate how to build **flow**s that combine Globus Compute and Globus Transfer to process and move data.

Each of these examples creates an archive file from the user's files and transfers that archive to a destination.
In one case the source data is already on the server running Globus Connect Server and Globus Compute, and in the other it is on a source **collection** owned by the end user.

== Prerequisites

To run these examples, you must have a properly configured server and some local software installed.

You must have a co-located Globus Connect Server Collection and Globus Compute **endpoint**, either hosted on the same server or at least with access to a shared filesystem.

Globus Connect Server Collection::
+
You can follow
link:https://docs.globus.org/globus-connect-server/v5.4/[this guide for setting up a Globus Connect Server Collection]
to install Globus Connect Server and configure a **collection**.
+
For ease of use, we recommend using a Guest Collection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to recommend this, because even though this is a good general recommendation, it may actually be more complicated for them to get right given that it makes it harder to reason about the path transformation needed? (Admittedly, though, this is such an expert-user/admin feature that I'm still trying to figure out how worried to be about this...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a clear win in all cases -- definitely a tradeoff. I'd be more gunshy about it if we weren't providing the flow.
The flow already contains the complexity around the base path manipulations. Given that the major cost of supporting Guest Collections in the flow has already been paid, I'm therefore pretty well inclined to stick with this as our guidance.


Globus Compute Endpoint::
+
link:https://globus-compute.readthedocs.io/en/latest/endpoints/installation.html[This guide for setting up a Globus Compute Endpoint]
covers installation of the Globus Compute software.
+
This Compute **endpoint** must have read/write permissions on the same storage location where the Globus Connect Server **ollection** is hosted.

Globus CLI::
+
You will also need the Globus CLI installed (link:https://docs.globus.org/cli/#installation[CLI installation docs]).
+
Globus CLI documentation recommends installation with `pipx`, as in `pipx install globus-cli`.

Globus Compute SDK::
+
You must have the `globus-compute-sdk` Python package available.
We strongly recommend using a virtual environment for this installation; installing with `pip install globus-compute-sdk`.
+
You can follow
link:https://globus-compute.readthedocs.io/en/stable/quickstart.html#installation[the Globus Compute install documentation]
to install the Compute SDK client package in a virtualenv.

ifdef::env-github[]
== Next: Learn About the `do_tar` Compute **Function**

link:./register_function.adoc[Register the `do_tar` Compute **Function**.]
endif::[]
102 changes: 102 additions & 0 deletions compute_transfer_examples/tar_and_transfer/example_flow1.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
== Example Flow 1

In this first example, the Compute and Transfer **flow** takes a user-provided list of source files that already exist in the **collection**.

The **flow** creates a tarfile from those files and transfers the tarfile to a user-provided destination collection.

The **flow** will:

1. Set constants for the **run**
2. Create an output directory named after the **run**'s ID on the GCS collection
3. Invoke the `do_tar` **function** to create a tar archive from the input source files and save it in the output directory
4. Transfer the resulting tarfile to the destination collection provided in the **flow** input
5. Delete the output directory

=== Create the **Flow**

1. Edit `compute_transfer_example_1_definition.json` and replace the placeholder values:

- `gcs_endpoint_id`: The **collection** ID
- `compute_endpoint_id`: The Compute **endpoint** ID
- `compute_function_id`: The UUID of the registered `do_tar` **function**

If the **collection** has a configured base path, also edit `gcs_base_path`.

2. Create the **flow**:
+
[source,bash,role=clippable-code]
----
globus flows create "Compute and Transfer Flow Example 1" \
./compute_transfer_example_1_definition.json \
--input-schema ./compute_transfer_example_1_schema.json
----

3. Save the **flow** ID returned by this command

ifndef::env-github[]
[.accordionize]
--
.compute_transfer_example_1_definition.json
[%collapsible]
====
[source,json,role=clippable-code]
----
include::compute_transfer_example_1_definition.json[]
----
====
.compute_transfer_example_1_schema.json
[%collapsible]
====
[source,json,role=clippable-code]
----
include::compute_transfer_example_1_schema.json[]
----
====
--
endif::[]

=== Run the **Flow**

1. Create the **flow** input JSON file:
+
[source,json,role=clippable-code]
----
{
"source_paths": ["/path/to/file1", "/path/to/file2"],
"destination_path": "/path/to/your/destination/file.tar.gz",
"destination_endpoint_id": "your-destination-endpoint-uuid"
}
----

2. Start the **flow**:
+
[source,bash,role=clippable-code]
----
globus flows start "$FLOW_ID" \
--input "<FLOW INPUT FILE>" \
--label "Compute and Transfer Flow Example 1 Run"
----
+
And save the **run** ID for use in the next command.

3. Monitor the **run** progress:
+
[source,bash,role=clippable-code]
----
globus flows run show "<RUN_ID>"
----
** At this point, the **run** _may_ become `INACTIVE`, depending on the type of **collection** being used.
** For inactive **run**s due to data access requirements, this can be resolved by resuming the **run** and following the prompts:
+
[source,bash,role=clippable-code]
----
globus flows run resume "<RUN_ID>"
----
+
When prompted, run `globus session consent` and rerun `globus flows run resume` to resume the **run**.

ifdef::env-github[]
== Next: Example Flow 2, with Data on a Separate **Collection**

link:./example_flow2.adoc[Example Flow 2.]
endif::[]
Loading