-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What did you find confusing? Please describe.
I was trying to run the Processing Job in the script mode with a custom dependencies provided through requirements.txt
file. To do that I packed both the script and requirements into a single example.tar.gz
file, uploaded that file into S3, and then provided the full path as source_dir
argument in the run
method of FrameworkProcessor
.
Note that the documentation does not specify what the name of the file should be, it only says that the it needs to be a tar.gz
file, so I assumed that the name of the file does not matter:
sagemaker-python-sdk/src/sagemaker/processing.py
Lines 1519 to 1523 in 95bbe7a
source_dir (str): Path (absolute, relative or an S3 URI) to a directory | |
with any other processing source code dependencies aside from the entry | |
point file (default: None). If ``source_dir`` is an S3 URI, it must | |
point to a tar.gz file. Structure within this directory are preserved | |
when processing on Amazon SageMaker (default: None). |
However it turns out, that the file name must be sourcedir.tar.gz
. Passing any other name means that the process running inside the container is not able to untar the file (as it does not know what to extract), thus failing the job.
Describe how documentation can be improved
Simply updating the lines from it must point to a tar.gz file
to it must point to a sourcedir.tar.gz file
should inform the user what file is expected.
Additional context
N/A