Skip to content

Commit 051cfbd

Browse files
committed
fix
1 parent d973b06 commit 051cfbd

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/machine-learning/how-to-use-parallel-job-in-pipeline.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -156,18 +156,18 @@ Azure Machine Learning parallel job exposes many optional settings that can auto
156156

157157
| Key | Type | Description | Allowed values | Default value | Set in attribute or program argument |
158158
|--|--|--|--|--|--|
159-
| `mini_batch_error_threshold` | integer | Number of failed mini-batches to ignore in this parallel job. If the count of failed mini-batches is higher than this threshold, the parallel job is marked as failed.<br><br>The mini-batch is marked as failed if:<br>- The count of return from `run()` is less than the mini-batch input count.<br>- Exceptions are caught in custom `run()` code. | `[-1, int.max]` | `-1`, meaning ignore all failed mini-batches | Attribute `mini_batch_error_threshold` |
159+
| `mini_batch_error_threshold` | integer | Number of failed mini-batches to ignore in this parallel job. If the count of failed mini-batches is higher than this threshold, the parallel job is marked as failed.<br><br>The mini-batch is marked as failed if:<br>- The count of return from `run()` is less than the mini-batch input count.<br>- Exceptions are caught in custom `run()` code. | `[-1, int.max]` | `-1`, meaning ignore all failed mini-batches | Attribute `mini_batch_error_threshold` |
160160
| `mini_batch_max_retries` | integer | Number of retries when the mini-batch fails or times out. If all retries fail, the mini-batch is marked as failed per the `mini_batch_error_threshold` calculation. | `[0, int.max]` | `2` | Attribute `retry_settings.max_retries` |
161161
| `mini_batch_timeout` | integer | Timeout in seconds for executing the custom `run()` function. If execution time is higher than this threshold, the mini-batch is aborted and marked as failed to trigger retry. | `(0, 259200]` | `60` | Attribute `retry_settings.timeout` |
162-
| `item_error_threshold` | integer | The threshold of failed items. Failed items are counted by the number gap between inputs and returns from each mini-batch. If the sum of failed items is higher than this threshold, the parallel job is marked as failed. | `[-1, int.max]` | `-1`, meaning ignore all failures during parallel job | Program argument `--error_threshold` |
163-
| `allowed_failed_percent` | integer | Similar to `mini_batch_error_threshold`, but uses the percent of failed mini-batches instead of the count. | `[0, 100]` | `100` | Program argument `--allowed_failed_percent` |
164-
| `overhead_timeout` | integer | Timeout in seconds for initialization of each mini-batch. For example, load mini-batch data and pass it to the `run()` function. | `(0, 259200]` | `600` | Program argument `--task_overhead_timeout` |
165-
| `progress_update_timeout` | integer | Timeout in seconds for monitoring the progress of mini-batch execution. If no progress updates are received within this timeout setting, the parallel job is marked as failed. | `(0, 259200]` | Dynamically calculated by other settings | Program argument `--progress_update_timeout` |
166-
| `first_task_creation_timeout` | integer | Timeout in seconds for monitoring the time between the job start and the run of the first mini-batch. | `(0, 259200]` | `600` | Program argument `--first_task_creation_timeout` |
162+
| `item_error_threshold` | integer | The threshold of failed items. Failed items are counted by the number gap between inputs and returns from each mini-batch. If the sum of failed items is higher than this threshold, the parallel job is marked as failed. | `[-1, int.max]` | `-1`, meaning ignore all failures during parallel job | Program argument<br>`--error_threshold` |
163+
| `allowed_failed_percent` | integer | Similar to `mini_batch_error_threshold`, but uses the percent of failed mini-batches instead of the count. | `[0, 100]` | `100` | Program argument<br>`--allowed_failed_percent` |
164+
| `overhead_timeout` | integer | Timeout in seconds for initialization of each mini-batch. For example, load mini-batch data and pass it to the `run()` function. | `(0, 259200]` | `600` | Program argument<br>`--task_overhead_timeout` |
165+
| `progress_update_timeout` | integer | Timeout in seconds for monitoring the progress of mini-batch execution. If no progress updates are received within this timeout setting, the parallel job is marked as failed. | `(0, 259200]` | Dynamically calculated by other settings | Program argument<br>`--progress_update_timeout` |
166+
| `first_task_creation_timeout` | integer | Timeout in seconds for monitoring the time between the job start and the run of the first mini-batch. | `(0, 259200]` | `600` | Program argument<br>`--first_task_creation_timeout` |
167167
| `logging_level` | string | The level of logs to dump to user log files. | `INFO`, `WARNING`, or `DEBUG` | `INFO` | Attribute `logging_level` |
168168
| `append_row_to` | string | Aggregate all returns from each run of the mini-batch and output it into this file. May refer to one of the outputs of the parallel job by using the expression `${{outputs.<output_name>}}` | | | Attribute `task.append_row_to` |
169-
| `copy_logs_to_parent` | string | Boolean option whether to copy the job progress, overview, and logs to the parent pipeline job. | `True` or `False` | `False` | N/A | `--copy_logs_to_parent` |
170-
| `resource_monitor_interval` | integer | Time interval in seconds to dump node resource usage (for example cpu or memory) to log folder under the *logs/sys/perf* path.<br><br>**Note:** Frequent dump resource logs slightly slow execution speed. Set this value to `0` to stop dumping resource usage. | `[0, int.max]` | `600` | Program argument `--resource_monitor_interval` |
169+
| `copy_logs_to_parent` | string | Boolean option whether to copy the job progress, overview, and logs to the parent pipeline job. | `True` or `False` | `False` | Program argument<br>`--copy_logs_to_parent` |
170+
| `resource_monitor_interval` | integer | Time interval in seconds to dump node resource usage (for example cpu or memory) to log folder under the *logs/sys/perf* path.<br><br>**Note:** Frequent dump resource logs slightly slow execution speed. Set this value to `0` to stop dumping resource usage. | `[0, int.max]` | `600` | Program argument<br>`--resource_monitor_interval` |
171171

172172
The following sample code updates these settings:
173173

@@ -241,6 +241,6 @@ To debug parallel job failure, select the **Outputs + logs** tab, expand the *lo
241241

242242
## Related content
243243

244-
- [CLI (v2) parallel job YAML schema](reference-yaml-job-parallel.md).
245-
- [Create and manage data assets](how-to-create-data-assets.md).
246-
- [Schedule machine learning pipeline jobs](how-to-schedule-pipeline-job.md).
244+
- [CLI (v2) parallel job YAML schema](reference-yaml-job-parallel.md)
245+
- [Create and manage data assets](how-to-create-data-assets.md)
246+
- [Schedule machine learning pipeline jobs](how-to-schedule-pipeline-job.md)

0 commit comments

Comments
 (0)