-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
Describe the bug
When an error occurs while calling run.log_metric
, it does not show the error message, but a KeyError.
To reproduce
It is a bit hard for me to describe this as it occured randomly after working for 42 epochs.
Expected behavior
Get a message of the actual Error cause.
Screenshots or logs
Train epoch 43: 68%|██████▊ | 622/921 [15:36<07:30, 1.51s/it]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/entrypoints/train_clrnet.py", line 69, in main
runner.train()
File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/engine/runner.py", line 185, in train
self._train_epoch(_sagemaker_run)
File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/engine/runner.py", line 269, in _train_epoch
_log_training_metrics(
File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/engine/runner.py", line 406, in _log_training_metrics
run.log_metric(name="Learning Rate", value=lr, step=step)
File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_utils.py", line 90, in wrapper
return func(*args, **kwargs)
File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/run.py", line 297, in log_metric
self._metrics_manager.log_metric(
File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 138, in log_metric
self.sink.log_metric(metric_data)
File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 173, in log_metric
self._drain()
File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 187, in _drain
self._send_metrics(available_metrics)
File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 200, in _send_metrics
message = errors[0]["Message"]
KeyError: 'Message'
System information
A description of your system. Please provide:
- SageMaker Python SDK version: '2.209.0'
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): -
- Framework version: -
- Python version: 3.10
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context
lorenzwalthert