You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`error_threshold`: The number of record failures for `TabularDataset` and file failures for `FileDataset` that should be ignored during processing. If the error count for the entire input goes above this value, the job will be aborted. The error threshold is for the entire input and not for individual mini-batch sent to the `run()` method. The range is `[-1, int.max]`. The `-1` part indicates ignoring all failures during processing.
295
295
-`output_action`: One of the following values indicates how the output will be organized:
296
296
-`summary_only`: The user script will store the output. `ParallelRunStep` will use the output only for the error threshold calculation.
297
-
-`append_row`: For all inputs, only one file will be created in the output folder to append all outputs separated by line. The file name is configurable, default fail name is `parallel_run_step.txt`.
298
-
-`append_row_file_name`: To customize the output file name for append_row output_action (optional).
297
+
-`append_row`: For all inputs, only one file will be created in the output folder to append all outputs separated by line.
298
+
-`append_row_file_name`: To customize the output file name for append_row output_action (optional; default value is `parallel_run_step.txt`).
299
299
-`source_directory`: Paths to folders that contain all files to execute on the compute target (optional).
300
300
-`compute_target`: Only `AmlCompute` is supported.
301
301
-`node_count`: The number of compute nodes to be used for running the user script.
302
302
-`process_count_per_node`: The number of processes per node. Best practice is to set to the number of GPU or CPU one node has (optional; default value is `1`).
303
303
-`environment`: The Python environment definition. You can configure it to use an existing Python environment or to set up a temporary environment. The definition is also responsible for setting the required application dependencies (optional).
304
304
-`logging_level`: Log verbosity. Values in increasing verbosity are: `WARNING`, `INFO`, and `DEBUG`. (optional; the default value is `INFO`)
305
305
-`run_invocation_timeout`: The `run()` method invocation timeout in seconds. (optional; default value is `60`)
306
-
-`run_max_try`: Max call count for`run()`method against a mini-batch. A `run()` is failed if an exception is thrown, or nothing is returned when `run_invocation_timeout` is reached (optional; default value is `3`).
306
+
-`run_max_try`: Maximum try count of`run()`for a mini-batch. A `run()` is failed if an exception is thrown, or nothing is returned when `run_invocation_timeout` is reached (optional; default value is `3`).
307
307
308
308
You can specify `mini_batch_size`, `node_count`, `process_count_per_node`, `logging_level`, `run_invocation_timeout` and `run_max_try` as `PipelineParameter`, so that when you resubmit a pipeline run, you can fine tune the parameter values. In this example, you use PipelineParameter for `mini_batch_size` and `Process_count_per_node` and you will change these values when resubmit a run later.
Create the ParallelRunStep by using the script, environment configuration, and parameters. Specify the compute target that you already attached to your workspace as the target of execution for your inference script. Use `ParallelRunStep` to create the batch inference pipeline step, which takes all the following parameters:
330
330
-`name`: The name of the step, with the following naming restrictions: unique, 3-32 characters, and regex ^\[a-z\]([-a-z0-9]*[a-z0-9])?$.
331
331
-`parallel_run_config`: A `ParallelRunConfig` object, as defined earlier.
332
-
-`inputs`: One or more single-typed Azure Machine Learning datasets.
333
-
-`side_inputs`: One or more reference data used as side inputs. Support for datasets.
332
+
-`inputs`: One or more single-typed Azure Machine Learning datasets to be partitioned for parallel processing.
333
+
-`side_inputs`: One or more reference data or datasets used as side inputs. No need to be partitioned.
334
334
-`output`: A `PipelineData` object that corresponds to the output directory.
335
335
-`arguments`: A list of arguments passed to the user script (optional).
336
336
-`allow_reuse`: Whether the step should reuse previous results when run with the same settings/inputs. If this parameter is `False`, a new run will always be generated for this step during pipeline execution. (optional; the default value is `True`.)
0 commit comments