Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# configuration for conversion to docs.globus.org
title: 'Tar and Transfer for collections with an associated flow policy'
short_description: |
Use Globus Compute to bundle files into a tarball, which you then transfer
using Globus Transfer.

Two examples are included here, one in which the files are located on the
server which runs Globus Compute, and one in which the files are on a user's
machine and must be moved to the Compute host.

These two examples are modified versions of the original tar and transfer examples.
They are expected to be invoked from the Globus webapp when initating a transfer
where the source / destination collections have an `associated_flow_policy`
with this flow.

example_dir: 'collection_transfer_requires_flow'
append_source_blocks: false
index_source:
concat:
files:
- 'README.adoc'
- 'register_function.adoc'
- 'example_flow1.adoc'
- 'example_flow2.adoc'
include_files:
- 'compute_transfer_example_1_definition.json'
- 'compute_transfer_example_1_schema.json'
- 'compute_transfer_example_2_definition.json'
- 'compute_transfer_example_2_schema.json'
- 'register_compute_function.py'

menu_weight: 400
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
= Tar and Transfer for collections with an associated flow policy

These examples demonstrate how to build **flow**s that are meant to be used as the `associated_flow_policy` for GCS collections.
They are variations of the Tar and Transfer examples that rather than get the source paths from the user input, they are parsed from the incoming `globus-transfer-transfer#0.10` transfer data.

Each of these examples creates an archive file from the user's files and transfers that archive to a destination.
In one case the source data is already on the server running Globus Connect Server and Globus Compute, and in the other it is on a source **collection** owned by the end user.

== Prerequisites

To run these examples, you must have a properly configured server and some local software installed.

You must have a co-located Globus Connect Server Collection and Globus Compute **endpoint**, either hosted on the same server or at least with access to a shared filesystem.

Globus Connect Server Collection::
+
You can follow
link:https://docs.globus.org/globus-connect-server/v5.4/[this guide for setting up a Globus Connect Server Collection]
to install Globus Connect Server and configure a **collection**.
+
For ease of use, we recommend using a Guest Collection.

Globus Compute Endpoint::
+
link:https://globus-compute.readthedocs.io/en/latest/endpoints/installation.html[This guide for setting up a Globus Compute Endpoint]
covers installation of the Globus Compute software.
+
This Compute **endpoint** must have read/write permissions on the same storage location where the Globus Connect Server **ollection** is hosted.

Globus CLI::
+
You will also need the Globus CLI installed (link:https://docs.globus.org/cli/#installation[CLI installation docs]).
+
Globus CLI documentation recommends installation with `pipx`, as in `pipx install globus-cli`.

Globus Compute SDK::
+
You must have the `globus-compute-sdk` Python package available.
We strongly recommend using a virtual environment for this installation; installing with `pip install globus-compute-sdk`.
+
You can follow
link:https://globus-compute.readthedocs.io/en/stable/quickstart.html#installation[the Globus Compute install documentation]
to install the Compute SDK client package in a virtualenv.

ifdef::env-github[]
== Next: Learn About the `do_tar` Compute **Function**

link:../register_function.adoc[Register the `do_tar` Compute **Function**.]
endif::[]
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
{
"StartAt": "SetRunVariables",
"States": {
"SetRunVariables": {
"Type": "ExpressionEval",
"Parameters": {
"gcs_base_path": "/",
"compute_endpoint_id": "<INSERT YOUR COMPUTE ENDPOINT ID HERE>",
"compute_function_id": "<INSERT YOUR COMPUTE FUNCTION ID HERE>",
"compute_output_directory.=": "'/' + _context.run_id + '/'",
"input_data_valid.=": "['~' in transfer_data.source_path or transfer_data.source_path == '/' for transfer_data in transfer_settings.DATA] == [False] * len(transfer_settings.DATA)"
},
"ResultPath": "$.run_vars",
"Next": "ValidateTransferData"
},
"ValidateTransferData": {
"Comment": "Validate that none of the input source paths are the path '/' or contain the character '~'.",
"Type": "Choice",
"Choices": [
{
"Variable": "$.run_vars.input_data_valid",
"BooleanEquals": false,
"Next": "InvalidTransferData"
}
],
"Default": "CollectSourcePaths"
},
"InvalidTransferData": {
"Type": "Fail",
"Error": "InvalidTransferData",
"Cause": "Invalid source path found in input transfer data."
},
"CollectSourcePaths": {
"Type": "ExpressionEval",
"Parameters": {
"paths.=": "[item.source_path for item in transfer_settings.DATA]"
},
"ResultPath": "$.src",
"Next": "MakeComputeWorkingDir"
},
"MakeComputeWorkingDir": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/mkdir",
"Parameters": {
"endpoint_id.$": "$.transfer_settings.source_endpoint",
"path.$": "$.run_vars.compute_output_directory"
},
"ResultPath": "$.mkdir_result",
"Next": "RunComputeFunction"
},
"RunComputeFunction": {
"Type": "Action",
"ActionUrl": "https://compute.actions.globus.org/v3",
"Parameters": {
"endpoint_id.$": "$.run_vars.compute_endpoint_id",
"tasks": [
{
"function_id.$": "$.run_vars.compute_function_id",
"args": [],
"kwargs": {
"src_paths.$": "$.src.paths",
"dest_path.$": "$.run_vars.compute_output_directory",
"gcs_base_path.$": "$.run_vars.gcs_base_path"
}
}
]
},
"ResultPath": "$.compute_func_result",
"Next": "GetDestinationPath"
},
"GetDestinationPath": {
"Comment": "To get the dest path, check if the variable 'destination_path' exists and if not, default to the filename returned by the compute function.",
"Type": "ExpressionEval",
"Parameters": {
"path.=": "getattr('destination_path', '/~/' + pathsplit(compute_func_result.details.result[0])[1])"
},
"ResultPath": "$.destination",
"Next": "TransferFromComputeEndpoint"
},
"TransferFromComputeEndpoint": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/transfer",
"Parameters": {
"source_endpoint.$": "$.transfer_settings.source_endpoint",
"destination_endpoint.$": "$.transfer_settings.destination_endpoint",
"DATA": [
{
"source_path.=": "compute_func_result.details.result[0]",
"destination_path.$": "$.destination.path"
}
]
},
"ResultPath": "$.transfer_result",
"Next": "CleanupComputeEndpoint"
},
"CleanupComputeEndpoint": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/delete",
"Parameters": {
"endpoint.$": "$.transfer_settings.source_endpoint",
"recursive": true,
"DATA": [
{
"path.$": "$.run_vars.compute_output_directory"
}
]
},
"ResultPath": "$.delete_result",
"End": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"type": "object",
"required": [
"transfer_settings"
],
"properties": {
"transfer_settings": {
"type": "object",
"format": "globus-transfer-transfer#0.10"
},
"destination_path": {
"type": "string",
"title": "Destination Collection Path",
"description": "The path on the destination collection for the tarball file"
}
},
"additionalProperties": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
{
"StartAt": "SetRunVariables",
"States": {
"SetRunVariables": {
"Type": "ExpressionEval",
"Parameters": {
"gcs_endpoint_id": "<INSERT YOUR GCS ENDPOINT ID HERE>",
"gcs_base_path": "/",
"compute_endpoint_id": "<INSERT YOUR COMPUTE ENDPOINT ID HERE>",
"compute_function_id": "<INSERT YOUR COMPUTE FUNCTION ID HERE>",
"compute_output_directory.=": "'/' + _context.run_id + '/'",
"input_data_valid.=": "['~' in transfer_data.source_path or transfer_data.source_path == '/' for transfer_data in transfer_settings.DATA] == [False] * len(transfer_settings.DATA)"
},
"ResultPath": "$.run_vars",
"Next": "ValidateTransferData"
},
"ValidateTransferData": {
"Comment": "Validate that none of the input source paths are the path '/' or contain the character '~'.",
"Type": "Choice",
"Choices": [
{
"Variable": "$.run_vars.input_data_valid",
"BooleanEquals": true,
"Next": "CollectTransferData"
}
],
"Default": "InvalidTransferData"
},
"InvalidTransferData": {
"Type": "Fail",
"Error": "InvalidTransferData",
"Cause": "Invalid source path in input transfer data."
},
"CollectTransferData": {
"Comment": "Construct a list of source paths to provide to the compute function and the transfer data to move the source paths to the intermidate collection.",
"Type": "ExpressionEval",
"Parameters": {
"src_paths.=": "[run_vars.compute_output_directory + pathsplit(transfer_data.source_path.rstrip('/'))[1] + '/' if transfer_data.source_path.endswith('/') else run_vars.compute_output_directory + pathsplit(transfer_data.source_path)[1] for transfer_data in transfer_settings.DATA]",
"src_to_intermidate_transfer_data.=": "[{'source_path': transfer_data.source_path, 'destination_path': run_vars.compute_output_directory + pathsplit(transfer_data.source_path.rstrip('/'))[1] + '/' if transfer_data.source_path.endswith('/') else run_vars.compute_output_directory + pathsplit(transfer_data.source_path)[1], 'DATA_TYPE': transfer_data.DATA_TYPE, 'recursive': transfer_data.recursive} for transfer_data in transfer_settings.DATA]"
},
"ResultPath": "$.transfer_data",
"Next": "MakeComputeWorkingDir"
},
"MakeComputeWorkingDir": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/mkdir",
"Parameters": {
"endpoint_id.$": "$.run_vars.gcs_endpoint_id",
"path.$": "$.run_vars.compute_output_directory"
},
"ResultPath": "$.mkdir_result",
"Next": "TransferToComputeEndpoint"
},
"TransferToComputeEndpoint": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/transfer",
"Parameters": {
"source_endpoint.$": "$.transfer_settings.source_endpoint",
"destination_endpoint.$": "$.run_vars.gcs_endpoint_id",
"DATA.$": "$.transfer_data.src_to_intermidate_transfer_data"
},
"ResultPath": "$.transfer_from_src_result",
"Next": "RunComputeFunction"
},
"RunComputeFunction": {
"Type": "Action",
"ActionUrl": "https://compute.actions.globus.org/v3",
"Parameters": {
"endpoint_id.$": "$.run_vars.compute_endpoint_id",
"tasks": [
{
"function_id.$": "$.run_vars.compute_function_id",
"args": [],
"kwargs": {
"src_paths.$": "$.transfer_data.src_paths",
"dest_path.$": "$.run_vars.compute_output_directory",
"gcs_base_path.$": "$.run_vars.gcs_base_path"
}
}
]
},
"ResultPath": "$.compute_func_result",
"Next": "GetDestinationPath"
},
"GetDestinationPath": {
"Comment": "To get the dest path, check if the variable 'destination_path' exists and if not, default to the filename returned by the compute function.",
"Type": "ExpressionEval",
"Parameters": {
"path.=": "getattr('destination_path', '/~/' + pathsplit(compute_func_result.details.result[0])[1])"
},
"ResultPath": "$.destination",
"Next": "TransferFromComputeEndpoint"
},
"TransferFromComputeEndpoint": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/transfer",
"Parameters": {
"source_endpoint.$": "$.run_vars.gcs_endpoint_id",
"destination_endpoint.$": "$.transfer_settings.destination_endpoint",
"DATA": [
{
"source_path.=": "compute_func_result.details.result[0]",
"destination_path.$": "$.destination.path"
}
]
},
"ResultPath": "$.transfer_to_dest_result",
"Next": "CleanupComputeEndpoint"
},
"CleanupComputeEndpoint": {
"Type": "Action",
"ActionUrl": "https://transfer.actions.globus.org/delete",
"Parameters": {
"endpoint.$": "$.run_vars.gcs_endpoint_id",
"recursive": true,
"DATA": [
{
"path.$": "$.run_vars.compute_output_directory"
}
]
},
"ResultPath": "$.delete_compute_output_result",
"End": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"type": "object",
"required": [
"transfer_settings"
],
"properties": {
"transfer_settings": {
"type": "object",
"format": "globus-transfer-transfer#0.10"
},
"destination_path": {
"type": "string",
"title": "Destination Collection Path",
"description": "The path on the destination collection for the tarball file"
}
},
"additionalProperties": false
}
Loading