Skip to content

Commit 0ef439c

Browse files
committed
updates to documentation
1 parent cdfc9d5 commit 0ef439c

File tree

2 files changed

+128
-71
lines changed

2 files changed

+128
-71
lines changed

README.md

Lines changed: 127 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,86 +1,92 @@
11
# Shepard
22

33
## Table of contents
4-
54
- [Shepard](#shepard)
6-
- [Getting started](#getting-started)
7-
- [What is Shepard?](#what-is-shepard)
8-
- [Requirements](#requirements)
9-
- [Hello World Example](#hello-world-example)
10-
- [What does running the Hello World Example do?](#what-does-running-the-hello-world-example-do)
11-
- [Ready to Use Shepard Setups](#ready-to-use-shepard-setups)
12-
- [Joining the Shepard Dev Team](#joining-the-shepard-dev-team)
13-
- [How to Join the Team](#how-to-join-the-team)
14-
- [General Team rules](#general-team-rules)
15-
- [Where to find things](#where-to-find-things)
16-
- [Technical Documentation](#technical-documentation)
17-
- [Overview of the Flock Architecture](#overview-of-the-flock-architecture)
18-
- [Right Sizing Jobs for Shepard](#right-sizing-jobs-for-shepard)
19-
- [Instantiating a Flock and Deploying Code to it](#instantiating-a-flock-and-deploying-code-to-it)
20-
- [The Structure of the "infrastructure" Folder](#the-structure-of-the-infrastructure-folder)
21-
- [The Structure of the "code" Folder](#the-structure-of-the-code-folder)
22-
- [Running a Job With Shepard](#running-a-job-with-shepard)
5+
* [Table of contents](#table-of-contents)
6+
* [Getting started](#getting-started)
7+
+ [What is Shepard?](#what-is-shepard-)
8+
+ [Requirements](#requirements)
9+
+ [Hello World Example](#hello-world-example)
10+
+ [What does running the Hello World Example do?](#what-does-running-the-hello-world-example-do-)
11+
* [Ready to Use Shepard Setups](#ready-to-use-shepard-setups)
12+
* [Joining the Shepard Dev Team](#joining-the-shepard-dev-team)
13+
+ [How to Join the Team](#how-to-join-the-team)
14+
+ [General Team rules](#general-team-rules)
15+
+ [Where to find things](#where-to-find-things)
16+
* [Technical Documentation](#technical-documentation)
17+
+ [Overview of the Flock Architecture](#overview-of-the-flock-architecture)
18+
+ [Right Sizing Jobs for Shepard](#right-sizing-jobs-for-shepard)
19+
+ [Instantiating a Flock and Deploying Code to it](#instantiating-a-flock-and-deploying-code-to-it)
20+
- [The Structure of the "infrastructure" Folder](#the-structure-of-the--infrastructure--folder)
21+
- [The Structure of the "code" Folder](#the-structure-of-the--code--folder)
22+
+ [Running a Job With Shepard](#running-a-job-with-shepard)
2323
- [Using the Shepard Batch Command](#using-the-shepard-batch-command)
2424
- [Using the Shepard Batch Via API Command](#using-the-shepard-batch-via-api-command)
25-
- [Writing Code for Use With Shepard](#writing-code-for-use-with-shepard)
25+
+ [Writing Code for Use With Shepard](#writing-code-for-use-with-shepard)
2626
- [Writing a Container For Use With Shepard](#writing-a-container-for-use-with-shepard)
2727
- [Shepard Code Example With Explanation](#shepard-code-example-with-explanation)
2828
- [Using the Quick-Deploy Feature](#using-the-quick-deploy-feature)
2929
- [Using Non-Public Container Images](#using-non-public-container-images)
30-
- [Deploy Secrets to a Flock](#deploy-secrets-to-a-flock)
31-
- [Collecting Results From Jobs Run With Shepard](#collecting-results-from-jobs-run-with-shepard)
30+
+ [Deploy Secrets to a Flock](#deploy-secrets-to-a-flock)
31+
+ [Collecting Results From Jobs Run With Shepard](#collecting-results-from-jobs-run-with-shepard)
3232
- [An Overview of Where Job Results Are Stored and How They Can be Retrieved](#an-overview-of-where-job-results-are-stored-and-how-they-can-be-retrieved)
33-
- [Tagging Outputs Automatically:](#tagging-outputs-automatically)
34-
- [Output Name Formats:](#output-name-formats)
35-
- [Detailed Documentation on Configuration Options for Shepard](#detailed-documentation-on-configuration-options-for-shepard)
33+
- [Tagging Outputs Automatically:](#tagging-outputs-automatically-)
34+
- [Output Name Formats:](#output-name-formats-)
35+
+ [Special Environment Variables](#special-environment-variables)
36+
- [Nonreserved Environment Variables:](#nonreserved-environment-variables-)
37+
- [Reserved Environment Variables:](#reserved-environment-variables-)
38+
- [Input Location Describing Environment Variables](#input-location-describing-environment-variables)
39+
- [Output Location Describing Environment Variables](#output-location-describing-environment-variables)
40+
- [Conditional Toggles Environment Variables](#conditional-toggles-environment-variables)
41+
+ [Detailed Documentation on Configuration Options for Shepard](#detailed-documentation-on-configuration-options-for-shepard)
3642
- [Flock Configuration Options](#flock-configuration-options)
37-
- [** General Stack Parameters**](#-general-stack-parameters)
38-
- [**Job Execution Parameters**](#job-execution-parameters)
39-
- [**S3 Parameters**](#s3-parameters)
40-
- [**DynamoDB Parameters**](#dynamodb-parameters)
41-
- [**Secrets Manager Parameters**](#secrets-manager-parameters)
42-
- [**Batch Parameters**](#batch-parameters)
43-
- [**Instance Tagging Parameters**](#instance-tagging-parameters)
44-
- [**ECR Parameters**](#ecr-parameters)
45-
- [**File System Parameters**](#file-system-parameters)
46-
- [**EFS Parameters**](#efs-parameters)
47-
- [**Lustre Parameters**](#lustre-parameters)
48-
- [**EBS Volume Parameters**](#ebs-volume-parameters)
49-
- [**SQS Parameters**](#sqs-parameters)
50-
- [**Lambda Parameters**](#lambda-parameters)
51-
- [**Extra IAM Policy Parameters**](#extra-iam-policy-parameters)
52-
- [**Networking Parameters**](#networking-parameters)
43+
* [** General Stack Parameters**](#---general-stack-parameters--)
44+
* [**Job Execution Parameters**](#--job-execution-parameters--)
45+
* [**S3 Parameters**](#--s3-parameters--)
46+
* [**DynamoDB Parameters**](#--dynamodb-parameters--)
47+
* [**Secrets Manager Parameters**](#--secrets-manager-parameters--)
48+
* [**Batch Parameters**](#--batch-parameters--)
49+
* [**Instance Tagging Parameters**](#--instance-tagging-parameters--)
50+
* [**ECR Parameters**](#--ecr-parameters--)
51+
* [**File System Parameters**](#--file-system-parameters--)
52+
+ [**EFS Parameters**](#--efs-parameters--)
53+
+ [**Lustre Parameters**](#--lustre-parameters--)
54+
* [**EBS Volume Parameters**](#--ebs-volume-parameters--)
55+
* [**SQS Parameters**](#--sqs-parameters--)
56+
* [**Lambda Parameters**](#--lambda-parameters--)
57+
* [**Extra IAM Policy Parameters**](#--extra-iam-policy-parameters--)
58+
* [**Networking Parameters**](#--networking-parameters--)
5359
- [CLI Configuration Options](#cli-configuration-options)
54-
- [Setting up AWSCLI Credentials](#setting-up-awscli-credentials)
55-
- [Shepard CLI Commands](#shepard-cli-commands)
56-
- [**auto_configure**](#auto_configure)
57-
- [**batch**](#batch)
58-
- [**batch_via_api**](#batch_via_api)
59-
- [**check_profile**](#check_profile)
60-
- [**check_role**](#check_role)
61-
- [**check_update**](#check_update)
62-
- [**clear_profile_config**](#clear_profile_config)
63-
- [**configure**](#configure)
64-
- [**delete_profile**](#delete_profile)
65-
- [**deploy**](#deploy)
66-
- [**describe**](#describe)
67-
- [**destroy**](#destroy)
68-
- [**query**](#query)
69-
- [**release_role**](#release_role)
70-
- [**retrieve**](#retrieve)
71-
- [**secretify**](#secretify)
72-
- [**set_profile**](#set_profile)
73-
- [**set_role**](#set_role)
74-
- [**where_am_i**](#where_am_i)
75-
- [Profiles in Shepard CLI](#profiles-in-shepard-cli)
76-
- [Profiles as a Concept](#profiles-as-a-concept)
77-
- [Importing Profiles](#importing-profiles)
78-
- [Understanding Setting Up a Profile Using the Configure Command](#understanding-setting-up-a-profile-using-the-configure-command)
79-
- [Assuming Role and Using the Shepard CLI](#assuming-role-and-using-the-shepard-cli)
80-
- [Assuming Role Without Using MFA](#assuming-role-without-using-mfa)
81-
- [Assuming Role Using MFA](#assuming-role-using-mfa)
82-
- [Using an Instance Attached Role](#using-an-instance-attached-role)
83-
- [Assuming Role Via the Shepard CLI](#assuming-role-via-the-shepard-cli)
60+
* [Setting up AWSCLI Credentials](#setting-up-awscli-credentials)
61+
* [Shepard CLI Commands](#shepard-cli-commands)
62+
+ [**auto_configure**](#--auto-configure--)
63+
+ [**batch**](#--batch--)
64+
+ [**batch_via_api**](#--batch-via-api--)
65+
+ [**check_profile**](#--check-profile--)
66+
+ [**check_role**](#--check-role--)
67+
+ [**check_update**](#--check-update--)
68+
+ [**clear_profile_config**](#--clear-profile-config--)
69+
+ [**configure**](#--configure--)
70+
+ [**delete_profile**](#--delete-profile--)
71+
+ [**deploy**](#--deploy--)
72+
+ [**describe**](#--describe--)
73+
+ [**destroy**](#--destroy--)
74+
+ [**query**](#--query--)
75+
+ [**release_role**](#--release-role--)
76+
+ [**retrieve**](#--retrieve--)
77+
+ [**secretify**](#--secretify--)
78+
+ [**set_profile**](#--set-profile--)
79+
+ [**set_role**](#--set-role--)
80+
+ [**where_am_i**](#--where-am-i--)
81+
* [Profiles in Shepard CLI](#profiles-in-shepard-cli)
82+
+ [Profiles as a Concept](#profiles-as-a-concept)
83+
+ [Importing Profiles](#importing-profiles)
84+
+ [Understanding Setting Up a Profile Using the Configure Command](#understanding-setting-up-a-profile-using-the-configure-command)
85+
* [Assuming Role and Using the Shepard CLI](#assuming-role-and-using-the-shepard-cli)
86+
+ [Assuming Role Without Using MFA](#assuming-role-without-using-mfa)
87+
+ [Assuming Role Using MFA](#assuming-role-using-mfa)
88+
+ [Using an Instance Attached Role](#using-an-instance-attached-role)
89+
+ [Assuming Role Via the Shepard CLI](#assuming-role-via-the-shepard-cli)
8490
- [Using GPUs For Jobs](#using-gpus-for-jobs)
8591

8692
## Getting started
@@ -383,6 +389,57 @@ Results uploaded to the outputs bucket at the end of a job from the path specifi
383389
* if TAG is specified : tag_to_append+'_result@' + UUID + '_' + START_TIME + '_lustre' + '.zip'
384390
* If TAG is not specified: 'result@' + UUID + '_' + START_TIME + '_lustre' + '.zip'
385391

392+
### Special Environment Variables
393+
There are special environment variables that Shepard sets or that a user can set and/or query that describe the existing configuration of an architecture or can be used to affect the behavior of an existing architecture.
394+
395+
#### Nonreserved Environment Variables:
396+
The following environment variables are special but not reserved. Setting these will modify the behavior of an existing architecture but will not cause jobs to be rejected.
397+
* __TAG__ - Specifying this will cause the value specified by tag to be appended to the front of any output names of files uploaded to the output bucket by the architecture.
398+
399+
#### Reserved Environment Variables:
400+
The following environment variables are reserved. Attempting to set these in a json in an inputs.json or in a json_payload file will cause a job to be rejected.
401+
* __UUID__ - The unique UUID given to each job run by Shepard
402+
* __START_TIME__ - The start time in UTC of the job
403+
* __END_TIME__ - The end time in UTC of the job
404+
* __JOB_STATUS__ - The status of your job. Possible values are: 'in_progress','calling_payload_code','job failed; pushing logs to s3','<done>','job failed; cleaning up workspace','job_complete_cleaning_up_workspace','job_complete_pushing_to_s3' and 'not_yet_initiated'.
405+
* __EFS_INPUT_NAME__ - A path to a folder on the EFS file system you can write to if you've requested an EFS file system for your architecture. Will not be set if you don't request an EFS for your architecture.
406+
* __EFS_OUTPUT_NAME__ - A path to folder on the root file system of the host instance that you can write to. Anything written here will be uploaded to the outputs bucket when the job finishes. Will not be set if you don't request an EFS for your architecture.
407+
* __LUSTRE_INPUT_NAME__ - A path to a folder on the Lustre file system you can write to if you've requested a Lustre file system for your architecture. Will not be set if you don't request Lustre for your architecture.
408+
* __LUSTRE_OUTPUT_NAME__ - A path to a folder on the Lustre file system you can write to if you've requested a Lustre file system for your architecture. Anything written here will be uploaded to the outputs bucket when the job finishes. Will not be set if you don't request Lustre for your architecture.
409+
* __ROOT_INPUT_NAME__ - A path to folder on the root file system of the host instance that you can write to. By default this folder will also contain your original input zip you uploaded to the input bucket and all of the files stored in secrets manager (through the use of the "shepard_cli secretify" command) will appear here in a folder called "secrets".
410+
* __ROOT_OUTPUT_NAME__ - A path to folder on the root file system of the host instance that you can write to. Anything written here will be uploaded to the outputs bucket when the job finishes.
411+
* __INPUTS_BUCKET__ - The input or trigger s3 bucket that your architecture uses to batch out jobs. Uploading job zips here will batch out jobs.
412+
* __OUTPUTS_BUCKET__ - The output or results s3 bucket that your architecture uses to store results from successful jobs. All outputs will appear here as zip files.
413+
* __ERROR_BUCKET__ - The error s3 bucket that log files are written to in the event of a fatal error in a container. Outputs will appear here as formatted zip files.
414+
* __INPUT_ZIP_NAME__ - The name of the original zip file that was dropped into the trigger s3 bucket specified by INPUTS_BUCKET that triggered this job. A copy of this will be located in the path specified by ROOT_INPUT_NAME.
415+
* __PATH__ - A variable that is commonly used to denote the location of binaries on machines we thought it would be prudent to exclude from being overwritten.
416+
* __HOSTNAME__ - The hostname of the instance your job is running on.
417+
* __USES_EFS__ - 'True' if you have requested EFS for this architecture and 'False' if you have not.
418+
* __USES_LUSTRE__ - 'True' if you have requested Lustre for this architecture and 'False' if you have not.
419+
* __LUSTRE_READ_ONLY_PATH__ - A path to a folder you can read (but not write) all data on the Lustre file system from if you've requested Lustre file system for your architecture. Will not be set if you don't request Lustre for your architecture.
420+
* __EFS_READ_ONLY_PATH__ - A path to a folder you can read (but not write) all data on the EFS from if you've requested an EFS file system for your architecture. Will not be set if you don't request an EFS for your architecture.
421+
* __ULIMIT_FILENO__ - This is the maximum number of files you can open in your container. This number can not be made to exceed 1048576 as of 27 May 2020 as this has been found to cause Batch instances running the default AMI to not boot. This will be a string of the number equal to whatever value is given in the cloudformation template for the UlimitsNoFilesOpen parameter (which is by default set to 1048576).
422+
* __IS_INVOKED__ - 'True' if this job was created via the API batching endpoint and 'False' if this job was created via s3 upload.
423+
424+
#### Input Location Describing Environment Variables
425+
The following environment variables describe locations where users can fetch input files from.
426+
* __ROOT_INPUT_NAME__ - Calling os.getenv('ROOT_INPUT_NAME') will return a path to folder on the root file system of the host instance that you can write to. By default this folder will also contain your original input zip you uploaded to the input bucket and all of the files stored in secrets manager (through the use of the "shepard_cli secretify" command) will appear here in a folder called "secrets". If you have a EFS or Lustre file system as part of your architecture the original input zip will be fetched to the temporary folders you are afforded write access to on either of those file systems (i.e. EFS_INPUT_NAME and LUSTRE_INPUT_NAME). If you have requested neither file system than the input zip will be fetched to this folder.
427+
* __EFS_INPUT_NAME__ - Calling os.getenv('EFS_INPUT_NAME') will return a path to a folder on the EFS file system you can write to if you've requested an EFS file system for your architecture.
428+
* __EFS_READ_ONLY_PATH__ - Calling os.getenv('EFS_READ_ONLY_PATH') will return a path to a folder you can read (but not write) all data on the EFS from if you've requested an EFS file system for your architecture.
429+
* __LUSTRE_INPUT_NAME__ - Calling os.getenv('LUSTRE_INPUT_NAME') will return a path to a folder on the Lustre file system you can write to if you've requested a Lustre file system for your architecture.
430+
* __LUSTRE_READ_ONLY_PATH__ - Calling os.getenv('LUSTRE_READ_ONLY_PATH') will return a path to a folder you can read (but not write) all data on the Lustre file system from if you've requested Lustre file system for your architecture.
431+
432+
#### Output Location Describing Environment Variables
433+
The following environment variables describe locations where users can deposit files they want to return as outputs from the execution of a job. All other files created during a job's execution not written to an output location are deleted when a job finishes running.
434+
* __ROOT_OUTPUT_NAME__ - Calling os.getenv('ROOT_OUTPUT_NAME') will return a path to folder on the root file system of the host instance that you can write to. Anything written here will be uploaded to the outputs bucket when the job finishes.
435+
* __EFS_OUTPUT_NAME__ - Calling os.getenv('EFS_OUTPUT_NAME') will return a path to a folder on the EFS file system you can write to if you've requested an EFS file system for your architecture. Anything written here will be uploaded to the outputs bucket when the job finishes.
436+
* __LUSTRE_OUTPUT_NAME__ - Calling os.getenv('LUSTRE_OUTPUT_NAME') will return a path to a folder on the Lustre file system you can write to if you've requested a Lustre file system for your architecture. Anything written here will be uploaded to the outputs bucket when the job finishes.
437+
438+
#### Conditional Toggles Environment Variables
439+
The following environment variables can be queried during job execution to determine whether a Flock has an EFS and/or a Lustre file system available for compute jobs to use.
440+
* __USES_EFS__ - Calling os.getenv('USES_EFS') will return "True" if you've requested an EFS file system for your architecture and "False" if you have not requested an EFS file system for your architecture.
441+
* __USES_LUSTRE__ - Calling os.getenv('USES_LUSTRE') will return "True" if you've requested a Lustre file system for your architecture and "False" if you have not requested a Lustre file system for your architecture.
442+
386443
### Detailed Documentation on Configuration Options for Shepard
387444

388445
#### Flock Configuration Options

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "shepard"
3-
version = "1.0.0"
3+
version = "1.0.1"
44
authors = [
55
{ name="Shepard Team", email="shepard_dev_group@googlegroups.com" },
66
]

0 commit comments

Comments
 (0)