You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/amazon_sagemaker_operators_for_kubernetes_jobs.rst
+129Lines changed: 129 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -304,6 +304,135 @@ job stops or completes.
304
304
continue to show on the Amazon SageMaker console. The delete command
305
305
takes about 2 minutes to clean up the resources from Amazon SageMaker.
306
306
307
+
SageMaker Debugger Jobs
308
+
^^^^^^^^^^^^^^^^^^^^^^^
309
+
310
+
When creating a SageMaker training job, you have an option to run
311
+
asynchronous debugger jobs for your model. It gives you full visibility
312
+
into a training job by using a hook to capture tensors that define
313
+
the state of the training process at each instance in its lifecycle.
314
+
It also provides the capability of defining 'rules' to
315
+
analyze the captured tensors. See `SageMaker Debugger Introduction <https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html>`__ and `How Debugger Works <https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-how-it-works.html>`__ for details.
316
+
317
+
You can get more details on debug job by using the ``describe`` kubectl verb.
318
+
The output of describing a training job will now have a new field ``Debug Rule Evaluation Statuses:``
0 commit comments