Skip to content

Commit 802cf17

Browse files
committed
update parameters
1 parent 24a3ec0 commit 802cf17

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

articles/machine-learning/v1/how-to-debug-parallel-run-step.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,15 @@ from <your_package> import <your_class>
100100
### Parameters for ParallelRunConfig
101101

102102
`ParallelRunConfig` is the major configuration for `ParallelRunStep` instance within the Azure Machine Learning pipeline. You use it to wrap your script and configure necessary parameters, including all of the following entries:
103-
- `entry_script`: A user script as a local file path that will be run in parallel on multiple nodes. If `source_directory` is present, use a relative path. Otherwise, use any path that's accessible on the machine.
103+
- `entry_script`: A user script as a local file path that to be run in parallel on multiple nodes. If `source_directory` is present, relative path should be used. Otherwise, use any path that's accessible on the machine.
104104
- `mini_batch_size`: The size of the mini-batch passed to a single `run()` call. (optional; the default value is `10` files for `FileDataset` and `1MB` for `TabularDataset`.)
105105
- For `FileDataset`, it's the number of files with a minimum value of `1`. You can combine multiple files into one mini-batch.
106-
- For `TabularDataset`, it's the size of data. Example values are `1024`, `1024KB`, `10MB`, and `1GB`. The recommended value is `1MB`. The mini-batch from `TabularDataset` will never cross file boundaries. For example, if you have .csv files with various sizes, the smallest file is 100 KB and the largest is 10 MB. If `mini_batch_size = 1MB` is set, the files smaller than 1 MB will be treated as one mini-batch. Files larger than 1 MB will be split into multiple mini-batches.
106+
- For `TabularDataset`, it's the size of data. Example values are `1024`, `1024KB`, `10MB`, and `1GB`. The recommended value is `1MB`. The mini-batch from `TabularDataset` will never cross file boundaries. For example, if there are multiple .csv files with various sizes, the smallest one is 100 KB and the largest is 10 MB. If `mini_batch_size = 1MB` is set, the files smaller than 1 MB will be treated as one mini-batch and the files larger than 1 MB will be split into multiple mini-batches.
107107
> [!NOTE]
108108
> TabularDatasets backed by SQL cannot be partitioned.
109109
> TabularDatasets from a single parquet file and single row group cannot be partitioned.
110110
111-
- `error_threshold`: The number of record failures for `TabularDataset` and file failures for `FileDataset` that should be ignored during processing. If the error count for the entire input goes above this value, the job will be aborted. The error threshold is for the entire input and not for individual mini-batch sent to the `run()` method. The range is `[-1, int.max]`. The `-1` indicates ignoring all failures during processing.
111+
- `error_threshold`: The number of record failures for `TabularDataset` and file failures for `FileDataset` that should be ignored during processing. Once the error count for the entire input goes above this value, the job will be aborted. The error threshold is for the entire input and not for individual mini-batch sent to the `run()` method. The range is `[-1, int.max]`. The `-1` indicates ignoring all failures during processing.
112112
- `output_action`: One of the following values indicates how the output will be organized:
113113
- `summary_only`: The user script needs to store the output files. The outputs of `run()` are used for the error threshold calculation only.
114114
- `append_row`: For all inputs, `ParallelRunStep` creates a single file in the output folder to append all outputs separated by line.
@@ -158,7 +158,7 @@ parallelrun_step = ParallelRunStep(
158158

159159
## Debugging scripts from remote context
160160

161-
The transition from debugging a scoring script locally to debugging a scoring script in an actual pipeline can be a difficult leap. For information on finding your logs in the portal, see [machine learning pipelines section on debugging scripts from a remote context](how-to-debug-pipelines.md). The information in that section also applies to a ParallelRunStep.
161+
The transition from debugging a scoring script locally to debugging a scoring script in an actual pipeline can be a difficult leap. For information on finding your logs in the portal, see [machine learning pipelines section on debugging scripts from a remote context](how-to-debug-pipelines.md). Information in that section also applies to a ParallelRunStep.
162162

163163
For example, the log file `70_driver_log.txt` contains information from the controller that launches the ParallelRunStep code.
164164

@@ -168,7 +168,7 @@ Because of the distributed nature of ParallelRunStep jobs, there are logs from s
168168

169169
- `~/logs/sys/master_role.txt`: This file provides the principal node (also known as the orchestrator) view of the running job. Includes task creation, progress monitoring, the run result.
170170

171-
Logs generated from entry script using EntryScript helper and print statements will be found in following files:
171+
Logs generated from entry script using EntryScript helper and print statements can be found in following files:
172172

173173
- `~/logs/user/entry_script_log/<node_id>/<process_name>.log.txt`: These files are the logs written from entry_script using EntryScript helper.
174174

@@ -178,7 +178,7 @@ Logs generated from entry script using EntryScript helper and print statements w
178178

179179
For a concise understanding of errors in your script there is:
180180

181-
- `~/logs/user/error.txt`: This file will try to summarize the errors in your script.
181+
- `~/logs/user/error.txt`: This file summarizes the errors in your script.
182182

183183
For more information on errors in your script, there is:
184184

@@ -220,7 +220,7 @@ def init():
220220

221221
def run(mini_batch):
222222
"""Call once for a mini batch. Accept and return the list back."""
223-
# This class is in singleton pattern and will return same instance as the one in init()
223+
# This class is in singleton pattern. It returns the same instance as the one in init()
224224
entry_script = EntryScript()
225225
logger = entry_script.logger
226226
logger.info(f"{__file__}: {mini_batch}.")
@@ -258,7 +258,7 @@ You can spawn new processes in your entry script with [`subprocess`](https://doc
258258

259259
The recommended approach is to use the [`run()`](https://docs.python.org/3/library/subprocess.html#subprocess.run) function with `capture_output=True`. Errors will show up in `logs/user/error/<node_id>/<process_name>.txt`.
260260

261-
If you want to use `Popen()`, you should redirect stdout/stderr to files, like:
261+
If you would like to use `Popen()`, stdout/stderr should be redirect to files, like:
262262
```python
263263
from pathlib import Path
264264
from subprocess import Popen
@@ -288,7 +288,7 @@ def init():
288288
>
289289
> If no `stdout` or `stderr` specified, a subprocess created with `Popen()` in your entry script will inherit the setting of the worker process.
290290
>
291-
> `stdout` will write to `logs/sys/node/<node_id>/processNNN.stdout.txt` and `stderr` to `logs/sys/node/<node_id>/processNNN.stderr.txt`.
291+
> `stdout` will write to `~/logs/sys/node/<node_id>/processNNN.stdout.txt` and `stderr` to `~/logs/sys/node/<node_id>/processNNN.stderr.txt`.
292292
293293

294294
## How do I write a file to the output directory, and then view it in the portal?
@@ -372,7 +372,7 @@ You can go into `~/logs/sys/error` to see if there's any exception. If there is
372372
### When will a job stop?
373373
if not canceled, the job will stop with status:
374374
- Completed. If all mini-batches have been processed and output has been generated for `append_row` mode.
375-
- Failed. If `error_threshold` in [`Parameters for ParallelRunConfig`](#parameters-for-parallelrunconfig) is exceeded, or system error occurred during the job.
375+
- Failed. If `error_threshold` in [`Parameters for ParallelRunConfig`](#parameters-for-parallelrunconfig) is exceeded, or system error occurs during the job.
376376

377377
### Where to find the root cause of failure?
378378
You can follow the lead in `~logs/job_result.txt` to find the cause and detailed error log.

0 commit comments

Comments
 (0)