Skip to content

KeyError: 'Message' when encountering an error in _send_metrics #4482

@sziem

Description

@sziem

Describe the bug
When an error occurs while calling run.log_metric, it does not show the error message, but a KeyError.

To reproduce
It is a bit hard for me to describe this as it occured randomly after working for 42 epochs.

Expected behavior
Get a message of the actual Error cause.

Screenshots or logs

Train epoch 43:  68%|██████▊   | 622/921 [15:36<07:30,  1.51s/it]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/entrypoints/train_clrnet.py", line 69, in main
    runner.train()
  File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/engine/runner.py", line 185, in train
    self._train_epoch(_sagemaker_run)
  File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/engine/runner.py", line 269, in _train_epoch
    _log_training_metrics(
  File "/home/ec2-user/train-clrnet/src/man_adt/model_training/train_clrnet/engine/runner.py", line 406, in _log_training_metrics
    run.log_metric(name="Learning Rate", value=lr, step=step)
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_utils.py", line 90, in wrapper
    return func(*args, **kwargs)
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/run.py", line 297, in log_metric
    self._metrics_manager.log_metric(
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 138, in log_metric
    self.sink.log_metric(metric_data)
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 173, in log_metric
    self._drain()
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 187, in _drain
    self._send_metrics(available_metrics)
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/train-net-3hJI2r0s-py3.10/lib/python3.10/site-packages/sagemaker/experiments/_metrics.py", line 200, in _send_metrics
    message = errors[0]["Message"]
KeyError: 'Message'

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: '2.209.0'
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): -
  • Framework version: -
  • Python version: 3.10
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions