PYTHONSDK-97: Adding documentation and code samples for new helper functions.

RachelTucker · RachelTucker · commit 8267929de982 · 2021-03-30T16:11:41.000-06:00
diff --git a/README.md b/README.md
@@ -1,29 +1,20 @@
-Spectra S3 Python3 SDK
---------------
-
+# Spectra S3 Python3 SDK
 [![Apache V2 License](http://img.shields.io/badge/license-Apache%20V2-blue.svg)](https://github.com/SpectraLogic/ds3_python3_sdk/blob/master/LICENSE.md)
 
 An SDK conforming to the Spectra S3 [specification](https://developer.spectralogic.com/doc/ds3api/1.2/wwhelp/wwhimpl/js/html/wwhelp.htm) for Python 3.6
 
-Contact Us
-----------
-
+## Contact Us
 Join us at our [Google Groups](https://groups.google.com/d/forum/spectralogicds3-sdks) forum to ask questions, or see frequently asked questions.
 
-Installing
-----------
-
+## Installing
 To install the ds3_python3_sdk, either clone the latest code, or download a release bundle from [Releases](http://github.com/SpectraLogic/ds3_python3_sdk/releases).  Once the code has been download, cd into the bundle, and install it with `sudo python3 setup.py install`
 
 Once `setup.py` completes the ds3_python3_sdk should be installed and available to be imported into python scripts.
 
-Documentation
--------------
+## Documentation
 The documentation for the SDK can be found at [http://spectralogic.github.io/ds3_python3_sdk/sphinx/v3.4.1/](http://spectralogic.github.io/ds3_python3_sdk/sphinx/v3.4.1/)
 
-SDK
----
-
+## SDK
 The SDK provides an interface for a user to add Spectra S3 functionality to their existing or new python application.  In order to take advantage of the SDK you need to import the `ds3` python package and module.  The following is an example that creates a Spectra S3 client from environment variables, creates a bucket, and lists all the buckets that are visible to the user.
 
 ```python
@@ -40,8 +31,7 @@ for bucket in getServiceResponse.result['BucketList']:
     print(bucket['Name'])
 ```
 
-Client
----------
+## Client
 In the ds3_python3_sdk there are two ways that you can create a `Client` instance: environment variables, or manually.  `ds3.createClientFromEnv` will create a `Client` using the following environment variables:
 
 * `DS3_ENDPOINT` - The URL to the DS3 Endpoint
@@ -61,10 +51,27 @@ client = ds3.Client("endpoint", ds3.Credentials("access_key", "secret_key"))
 
 The proxy URL can be passed in as the named parameter `proxy` to `Client()`.
 
-Putting Data
-------------
+## Examples Communicating with the BP
+
+[An example of using getService and getBucket to list all accessible buckets and objects](samples/listAll.py)
 
-To put data to a Spectra S3 appliance you have to do it inside of the context of what is called a Bulk Job.  Bulk Jobs allow the Spectra S3 appliance to plan how data should land to cache, and subsequently get written/read to/from tape.  The basic flow of every job is:
+### HELPERS: Simple way of moving data to/from a file system
+There are helper utilities for putting and getting data to a BP. These are designed to simplify the user workflow so 
+that you don't have to worry about BP job management. The helpers will create BP jobs as necessary, and transfer data 
+in parallel to improve performance.
+
+#### How to move everything:
+- [An example of putting ALL files in a directory to a BP bucket](samples/putting_all_files_in_directory.py)
+- [An example of getting ALL objects in a bucket and landing them in a directory](samples/getting_all_objects_in_bucket.py)
+
+#### How to move some things:
+If you only want to move some items in a directory/bucket, you can specify them individually. These examples show how 
+to put and get a specific file, but the principle can be expanded to transferring multiple items at once.
+- [An example of putting ONE file to a BP bucket](samples/putting_one_file_in_directory.py)
+- [An example of getting ONE object in a bucket](samples/getting_one_file_in_directory.py)
+
+### Moving data the old way
+To put data to a Spectra S3 appliance you have to do it inside the context of what is called a Bulk Job.  Bulk Jobs allow the Spectra S3 appliance to plan how data should land to cache, and subsequently get written/read to/from tape.  The basic flow of every job is:
 
 * Generate the list of objects that will either be sent to or retrieved from Spectra S3
 * Send a bulk put/get to Spectra S3 to plan the job
@@ -76,6 +83,4 @@ To put data to a Spectra S3 appliance you have to do it inside of the context of
 
 [An example of getting data with the Python SDK](samples/gettingData.py)
 
-[An example of using getService and getBucket to list all accessible buckets and objects](samples/listAll.py)
-
 [An example of how give objects on the server a different name than what is on the filesystem, and how to delete objects by folder](samples/renaming.py)
diff --git a/samples/getting_all_objects_in_bucket.py b/samples/getting_all_objects_in_bucket.py
@@ -0,0 +1,50 @@
+#   Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
+#   Licensed under the Apache License, Version 2.0 (the "License"). You may not use
+#   this file except in compliance with the License. A copy of the License is located at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#   or in the "license" file accompanying this file.
+#   This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+#   CONDITIONS OF ANY KIND, either express or implied. See the License for the
+#   specific language governing permissions and limitations under the License.
+
+import tempfile
+
+from os import path, walk
+from ds3 import ds3, ds3Helpers
+
+# This example gets ALL objects within the bucket books and lands them in a temp folder.
+# This uses the new helper functions which creates and manages the BP jobs behind the scenes.
+#
+# This assumes that there exists a bucket called books on the BP and it contains objects.
+# Running the putting_all_files_in_directory.py example will create this setup.
+
+# The bucket that contains the objects.
+bucket_name = "books"
+
+# The directory on the file system where the objects will be landed.
+# In this example, we are using a temporary directory for easy cleanup.
+destination_directory = tempfile.TemporaryDirectory(prefix="books-dir")
+
+# Create a client which will be used to communicate with the BP.
+client = ds3.createClientFromEnv()
+
+# Create the helper to gain access to the new data movement utilities.
+helper = ds3Helpers.Helper(client=client)
+
+# Retrieve all the objects in the desired bucket and land them in the specified directory.
+#
+# You can optionally specify a objects_per_bp_job and max_threads to tune performance.
+get_job_ids = helper.get_all_files_in_bucket(destination_dir=destination_directory.name, bucket=bucket_name)
+print("BP get job IDS: " + get_job_ids.__str__())
+
+# Verify that all the files have been landed in the folder.
+for root, dirs, files in walk(top=destination_directory.name):
+    for name in files:
+        print("File: " + path.join(root, name))
+    for name in dirs:
+        print("Dir: " + path.join(root, name))
+
+# Clean up the temp directory where we landed the files.
+destination_directory.cleanup()
diff --git a/samples/getting_one_file_in_directory.py b/samples/getting_one_file_in_directory.py
@@ -0,0 +1,57 @@
+#   Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
+#   Licensed under the Apache License, Version 2.0 (the "License"). You may not use
+#   this file except in compliance with the License. A copy of the License is located at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#   or in the "license" file accompanying this file.
+#   This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+#   CONDITIONS OF ANY KIND, either express or implied. See the License for the
+#   specific language governing permissions and limitations under the License.
+
+import tempfile
+
+from os import path, walk
+from ds3 import ds3, ds3Helpers
+
+# This example gets ONE objects within the bucket books and lands it in a temp folder.
+# This uses the new helper functions which creates and manages the BP job behind the scenes.
+#
+# This assumes that there exists a bucket called books on the BP and it contains the object beowulf.txt.
+# Running the putting_one_file_in_directory.py example will create this setup.
+
+# The bucket that contains the objects.
+bucket_name = "books"
+
+# The directory on the file system where the object will be landed.
+# In this example, we are using a temporary directory for easy cleanup.
+destination_directory = tempfile.TemporaryDirectory(prefix="books-dir")
+
+# Create a client which will be used to communicate with the BP.
+client = ds3.createClientFromEnv()
+
+# Create the helper to gain access to the new data movement utilities.
+helper = ds3Helpers.Helper(client=client)
+
+# Create a HelperGetObject for each item you want to retrieve from the BP bucket.
+# This example only gets one object, but you can transfer more than one at a time.
+# For each object you must specify the name of the object on the BP, and the file path where you want to land the file.
+# Optionally, if versioning is enabled on your bucket, you can specify the specific version to retrieve.
+# If you don't specify a version, the most recent will be retrieved.
+file_path = path.join(destination_directory.name, "beowulf.txt")
+get_objects = [ds3Helpers.HelperGetObject(object_name="beowulf.txt", destination_path=file_path)]
+
+# Retrieve the objects in the desired bucket.
+# You can optionally specify max_threads to tune performance.
+get_job_id = helper.get_objects(get_objects=get_objects, bucket=bucket_name)
+print("BP get job ID: " + get_job_id)
+
+# Verify that all the files have been landed in the folder.
+for root, dirs, files in walk(top=destination_directory.name):
+    for name in files:
+        print("File: " + path.join(root, name))
+    for name in dirs:
+        print("Dir: " + path.join(root, name))
+
+# Clean up the temp directory where we landed the files.
+destination_directory.cleanup()
diff --git a/samples/putting_all_files_in_directory.py b/samples/putting_all_files_in_directory.py
@@ -0,0 +1,47 @@
+#   Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
+#   Licensed under the Apache License, Version 2.0 (the "License"). You may not use
+#   this file except in compliance with the License. A copy of the License is located at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#   or in the "license" file accompanying this file.
+#   This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+#   CONDITIONS OF ANY KIND, either express or implied. See the License for the
+#   specific language governing permissions and limitations under the License.
+
+import os
+
+from ds3 import ds3, ds3Helpers
+
+# This example puts ALL files within the sub-folder /samples/resources to the bucket called books.
+# This uses the new helper functions which creates and manages the BP jobs behind the scenes.
+
+# The bucket where to land the files.
+bucket_name = "books"
+
+# The directory that contains files to be archived to BP.
+# In this example, we are moving all files in the ds3_python3_sdk/samples/resources folder.
+directory_with_files = os.path.join(os.path.dirname(str(__file__)), "resources")
+
+# Create a client which will be used to communicate with the BP.
+client = ds3.createClientFromEnv()
+
+# Make sure the bucket that we will be sending objects to exists
+client.put_bucket(ds3.PutBucketRequest(bucket_name))
+
+# Create the helper to gain access to the new data movement utilities.
+helper = ds3Helpers.Helper(client=client)
+
+# Archive all the files in the desired directory to the specified bucket.
+# Note that the file's object names will be relative to the root directory you specified.
+# For example: resources/beowulf.txt will be named just beowulf.txt in the BP bucket.
+#
+# You can optionally specify a objects_per_bp_job and max_threads to tune performance.
+put_job_ids = helper.put_all_objects_in_directory(source_dir=directory_with_files, bucket=bucket_name)
+print("BP put job IDs: " + put_job_ids.__str__())
+
+# we now verify that all our objects have been sent to DS3
+bucketResponse = client.get_bucket(ds3.GetBucketRequest(bucket_name))
+
+for obj in bucketResponse.result['ContentsList']:
+    print(obj['Key'])
diff --git a/samples/putting_one_file_in_directory.py b/samples/putting_one_file_in_directory.py
@@ -0,0 +1,49 @@
+#   Copyright 2021 Spectra Logic Corporation. All Rights Reserved.
+#   Licensed under the Apache License, Version 2.0 (the "License"). You may not use
+#   this file except in compliance with the License. A copy of the License is located at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#   or in the "license" file accompanying this file.
+#   This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+#   CONDITIONS OF ANY KIND, either express or implied. See the License for the
+#   specific language governing permissions and limitations under the License.
+
+import os
+
+from ds3 import ds3, ds3Helpers
+
+# This example puts ONE file /samples/resources/beowulf.txt to the bucket called books.
+# This uses the new helper functions which creates and manages a single BP job.
+
+# The bucket where to land the files.
+bucket_name = "books"
+
+# The file path being put to the BP.
+file_path = os.path.join(os.path.dirname(str(__file__)), "resources", "beowulf.txt")
+
+# Create a client which will be used to communicate with the BP.
+client = ds3.createClientFromEnv()
+
+# Make sure the bucket that we will be sending objects to exists
+client.put_bucket(ds3.PutBucketRequest(bucket_name))
+
+# Create the helper to gain access to the new data movement utilities.
+helper = ds3Helpers.Helper(client=client)
+
+# Create a HelperPutObject for each item you want to send to the BP.
+# This example only puts one file, but you can send more than one at a time.
+# For each object you must specify the name it will be called on the BP, the file path, and the size of the file.
+file_size = os.path.getsize(file_path)
+put_objects = [ds3Helpers.HelperPutObject(object_name="beowulf.txt", file_path=file_path, size=file_size)]
+
+# Archive the files to the specified bucket
+# You can optionally specify max_threads to tune performance.
+put_job_id = helper.put_objects(put_objects=put_objects, bucket=bucket_name)
+print("BP put job ID: " + put_job_id)
+
+# we now verify that all our objects have been sent to DS3
+bucketResponse = client.get_bucket(ds3.GetBucketRequest(bucket_name))
+
+for obj in bucketResponse.result['ContentsList']:
+    print(obj['Key'])