-
Notifications
You must be signed in to change notification settings - Fork 2
Break up compute example and make it publishable #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
27 changes: 27 additions & 0 deletions
27
compute_transfer_examples/tar_and_transfer/.doc_config.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # configuration for conversion to docs.globus.org | ||
| title: 'Tar and Transfer Files with Compute' | ||
| short_description: | | ||
| Use Globus Compute to bundle files into a tarball, which you then transfer | ||
| using Globus Transfer. | ||
|
|
||
| Two examples are included here, one in which the files are located on the | ||
| server which runs Globus Compute, and one in which the files are on a user's | ||
| machine and must be moved to the Compute host. | ||
|
|
||
| example_dir: 'compute_tar_and_transfer' | ||
| append_source_blocks: false | ||
| index_source: | ||
| concat: | ||
| files: | ||
| - 'README.adoc' | ||
| - 'register_function.adoc' | ||
| - 'example_flow1.adoc' | ||
| - 'example_flow2.adoc' | ||
| include_files: | ||
| - 'compute_transfer_example_1_definition.json' | ||
| - 'compute_transfer_example_1_schema.json' | ||
| - 'compute_transfer_example_2_definition.json' | ||
| - 'compute_transfer_example_2_schema.json' | ||
| - 'register_compute_function.py' | ||
|
|
||
| menu_weight: 400 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| = Tar and Transfer with Globus Compute | ||
|
|
||
| These examples demonstrate how to build **flow**s that combine Globus Compute and Globus Transfer to process and move data. | ||
|
|
||
| Each of these examples creates an archive file from the user's files and transfers that archive to a destination. | ||
| In one case the source data is already on the server running Globus Connect Server and Globus Compute, and in the other it is on a source **collection** owned by the end user. | ||
|
|
||
| == Prerequisites | ||
|
|
||
| To run these examples, you must have a properly configured server and some local software installed. | ||
|
|
||
| You must have a co-located Globus Connect Server Collection and Globus Compute **endpoint**, either hosted on the same server or at least with access to a shared filesystem. | ||
|
|
||
| Globus Connect Server Collection:: | ||
| + | ||
| You can follow | ||
| link:https://docs.globus.org/globus-connect-server/v5.4/[this guide for setting up a Globus Connect Server Collection] | ||
| to install Globus Connect Server and configure a **collection**. | ||
| + | ||
| For ease of use, we recommend using a Guest Collection. | ||
|
|
||
| Globus Compute Endpoint:: | ||
| + | ||
| link:https://globus-compute.readthedocs.io/en/latest/endpoints/installation.html[This guide for setting up a Globus Compute Endpoint] | ||
| covers installation of the Globus Compute software. | ||
| + | ||
| This Compute **endpoint** must have read/write permissions on the same storage location where the Globus Connect Server **ollection** is hosted. | ||
|
|
||
| Globus CLI:: | ||
| + | ||
| You will also need the Globus CLI installed (link:https://docs.globus.org/cli/#installation[CLI installation docs]). | ||
| + | ||
| Globus CLI documentation recommends installation with `pipx`, as in `pipx install globus-cli`. | ||
|
|
||
| Globus Compute SDK:: | ||
| + | ||
| You must have the `globus-compute-sdk` Python package available. | ||
| We strongly recommend using a virtual environment for this installation; installing with `pip install globus-compute-sdk`. | ||
| + | ||
| You can follow | ||
| link:https://globus-compute.readthedocs.io/en/stable/quickstart.html#installation[the Globus Compute install documentation] | ||
| to install the Compute SDK client package in a virtualenv. | ||
|
|
||
| ifdef::env-github[] | ||
| == Next: Learn About the `do_tar` Compute **Function** | ||
|
|
||
| link:./register_function.adoc[Register the `do_tar` Compute **Function**.] | ||
| endif::[] | ||
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
102 changes: 102 additions & 0 deletions
102
compute_transfer_examples/tar_and_transfer/example_flow1.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| == Example Flow 1 | ||
|
|
||
| In this first example, the Compute and Transfer **flow** takes a user-provided list of source files that already exist in the **collection**. | ||
sirosen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The **flow** creates a tarfile from those files and transfers the tarfile to a user-provided destination collection. | ||
|
|
||
| The **flow** will: | ||
|
|
||
| 1. Set constants for the **run** | ||
| 2. Create an output directory named after the **run**'s ID on the GCS collection | ||
sirosen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 3. Invoke the `do_tar` **function** to create a tar archive from the input source files and save it in the output directory | ||
| 4. Transfer the resulting tarfile to the destination collection provided in the **flow** input | ||
| 5. Delete the output directory | ||
sirosen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| === Create the **Flow** | ||
|
|
||
| 1. Edit `compute_transfer_example_1_definition.json` and replace the placeholder values: | ||
|
|
||
| - `gcs_endpoint_id`: The **collection** ID | ||
sirosen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - `compute_endpoint_id`: The Compute **endpoint** ID | ||
| - `compute_function_id`: The UUID of the registered `do_tar` **function** | ||
|
|
||
| If the **collection** has a configured base path, also edit `gcs_base_path`. | ||
|
|
||
| 2. Create the **flow**: | ||
| + | ||
| [source,bash,role=clippable-code] | ||
| ---- | ||
| globus flows create "Compute and Transfer Flow Example 1" \ | ||
| ./compute_transfer_example_1_definition.json \ | ||
| --input-schema ./compute_transfer_example_1_schema.json | ||
| ---- | ||
|
|
||
| 3. Save the **flow** ID returned by this command | ||
|
|
||
| ifndef::env-github[] | ||
| [.accordionize] | ||
| -- | ||
| .compute_transfer_example_1_definition.json | ||
| [%collapsible] | ||
| ==== | ||
| [source,json,role=clippable-code] | ||
| ---- | ||
| include::compute_transfer_example_1_definition.json[] | ||
| ---- | ||
| ==== | ||
| .compute_transfer_example_1_schema.json | ||
| [%collapsible] | ||
| ==== | ||
| [source,json,role=clippable-code] | ||
| ---- | ||
| include::compute_transfer_example_1_schema.json[] | ||
| ---- | ||
| ==== | ||
| -- | ||
| endif::[] | ||
|
|
||
| === Run the **Flow** | ||
|
|
||
| 1. Create the **flow** input JSON file: | ||
| + | ||
| [source,json,role=clippable-code] | ||
| ---- | ||
| { | ||
| "source_paths": ["/path/to/file1", "/path/to/file2"], | ||
| "destination_path": "/path/to/your/destination/file.tar.gz", | ||
| "destination_endpoint_id": "your-destination-endpoint-uuid" | ||
| } | ||
| ---- | ||
|
|
||
| 2. Start the **flow**: | ||
| + | ||
| [source,bash,role=clippable-code] | ||
| ---- | ||
| globus flows start "$FLOW_ID" \ | ||
| --input "<FLOW INPUT FILE>" \ | ||
| --label "Compute and Transfer Flow Example 1 Run" | ||
| ---- | ||
| + | ||
| And save the **run** ID for use in the next command. | ||
|
|
||
| 3. Monitor the **run** progress: | ||
| + | ||
| [source,bash,role=clippable-code] | ||
| ---- | ||
| globus flows run show "<RUN_ID>" | ||
| ---- | ||
| ** At this point, the **run** _may_ become `INACTIVE`, depending on the type of **collection** being used. | ||
| ** For inactive **run**s due to data access requirements, this can be resolved by resuming the **run** and following the prompts: | ||
| + | ||
| [source,bash,role=clippable-code] | ||
| ---- | ||
| globus flows run resume "<RUN_ID>" | ||
| ---- | ||
| + | ||
| When prompted, run `globus session consent` and rerun `globus flows run resume` to resume the **run**. | ||
|
|
||
| ifdef::env-github[] | ||
| == Next: Example Flow 2, with Data on a Separate **Collection** | ||
|
|
||
| link:./example_flow2.adoc[Example Flow 2.] | ||
| endif::[] | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to recommend this, because even though this is a good general recommendation, it may actually be more complicated for them to get right given that it makes it harder to reason about the path transformation needed? (Admittedly, though, this is such an expert-user/admin feature that I'm still trying to figure out how worried to be about this...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a clear win in all cases -- definitely a tradeoff. I'd be more gunshy about it if we weren't providing the flow.
The flow already contains the complexity around the base path manipulations. Given that the major cost of supporting Guest Collections in the flow has already been paid, I'm therefore pretty well inclined to stick with this as our guidance.