-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
I'm attempting to run a basic, simple, proof of concept model training workflow with Sagemaker in Python and I cannot get anything to work. The estimator's fit() function just hangs. No errors, no logs being generated in console, no DEBUG info lines being generated. It just hangs. I've already validated the IAM functions, the S3 inputs, etc. and everything is fine. If I bypass the estimator and create jobs manually with boto3 it works fine (although very, very clunkily due to how much code is required).
To reproduce
This is the python script I'm attempting to run
import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.estimator import Estimator
sagemaker_session = Session()
role = get_execution_role()
output_path = "s3://<redacted>/output"
xgboost_image_uri = sagemaker.image_uris.retrieve(
framework="xgboost",
region=sagemaker_session.boto_region_name,
version="1.5-1"
)
xgboost_estimator = Estimator(
image_uri=xgboost_image_uri,
role=role,
instance_count=1,
instance_type="ml.m5.large",
output_path=output_path,
volume_size=5,
sagemaker_session=sagemaker_session,
enable_network_isolation=False,
hyperparameters={
"max_depth": 5,
"eta": 0.2,
"objective": "reg:squarederror",
"num_round": 10
}
)
s3_input_data = "s3://sagemaker-sample-data-us-east-1/processing/census/census-income.csv"
xgboost_estimator.fit({"train": s3_input_data})
The last line just hangs forever, with nothing happening. No errors. No logs generated in CloudWatch. No debug lines spat out when logging level is set to debug. No jobs being generated in SageMaker. It just fails to do anything at all.
Expected behavior
The code would work and a job would get created.
Screenshots or logs
This is what the code looks like running with debug level set.
sagemaker.config INFO - Not applying SDK defaults from location: C:\ProgramData\sagemaker\sagemaker\config.yaml sagemaker.config INFO - Not applying SDK defaults from location: C:\Users\scott\AppData\Local\sagemaker\sagemaker\config.yaml [12/17/24 06:26:26] INFO Loading cached SSO token for pvcts tokens.py:305 [12/17/24 06:26:28] INFO Ignoring unnecessary instance type: None. image_uris.py:528 DEBUG sagemaker_session found, preparing to emit telemetry... telemetry_logging.py:89 INFO SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver telemetry_logging.py:90 additional features. To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk. DEBUG TelemetryOptOut flag is set to: False telemetry_logging.py:102 DEBUG Train args after processing defaults: {'input_config': [{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix', 'S3Uri': estimator.py:2513 's3://sagemaker-sample-data-us-east-1/processing/census/census-income.csv', 'S3DataDistributionType': 'FullyReplicated'}}, 'ChannelName': 'train'}], 'role': 'arn:aws:iam::<redacted>:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_<redacted>', 'output_config': {'S3OutputPath': 's3://pbn-sagemaker-simplemodel-test/output'}, 'resource_config': {'VolumeSizeInGB': 5, 'InstanceCount': 1, 'InstanceType': 'ml.m5.large'}, 'stop_condition': {'MaxRuntimeInSeconds': 86400}, 'vpc_config': None, 'input_mode': 'File', 'job_name': 'sagemaker-xgboost-2024-12-17-06-26-28-534', 'hyperparameters': {'max_depth': '5', 'eta': '0.2', 'objective': 'reg:squarederror', 'num_round': '10'}, 'tags': None, 'metric_definitions': None, 'experiment_config': None, 'environment': None, 'enable_network_isolation': False, 'retry_strategy': None, 'image_uri': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.5-1', 'debugger_hook_config': {'S3OutputPath': 's3://pbn-sagemaker-simplemodel-test/output', 'CollectionConfigurations': []}, 'profiler_config': {'S3OutputPath': 's3://pbn-sagemaker-simplemodel-test/output', 'DisableProfiler': False}}
System information
A description of your system. Please provide:
- SageMaker Python SDK version: sagemaker 2.237.1, sagemaker-core 1.0.17
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Not sure?
- Framework version: Not sure
- Python version: 3.12.0
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.