Skip to content

Commit 40141ba

Browse files
Rearange job task alphabetically (#3535)
1 parent def3a22 commit 40141ba

File tree

1 file changed

+35
-35
lines changed

1 file changed

+35
-35
lines changed

docs/resources/job.md

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ This block describes individual tasks:
113113
* `*_task` - (Required) one of the specific task blocks described below:
114114
* `condition_task`
115115
* `dbt_task`
116+
* `for_each_task`
116117
* `notebook_task`
117118
* `pipeline_task`
118119
* `python_wheel_task`
@@ -121,7 +122,6 @@ This block describes individual tasks:
121122
* `spark_python_task`
122123
* `spark_submit_task`
123124
* `sql_task`
124-
* `for_each_task`
125125
* `library` - (Optional) (Set) An optional list of libraries to be installed on the cluster that will execute the job.
126126
* `depends_on` - (Optional) block specifying dependency(-ies) for a given task.
127127
* `job_cluster_key` - (Optional) Identifier of the Job cluster specified in the `job_cluster` block.
@@ -138,22 +138,35 @@ This block describes individual tasks:
138138

139139
-> **Note** If no `job_cluster_key`, `existing_cluster_id`, or `new_cluster` were specified in task definition, then task will executed using serverless compute.
140140

141-
#### spark_jar_task Configuration Block
141+
#### condition_task Configuration Block
142142

143-
* `parameters` - (Optional) (List) Parameters passed to the main method.
144-
* `main_class_name` - (Optional) The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use `SparkContext.getOrCreate` to obtain a Spark context; otherwise, runs of the job will fail.
143+
The `condition_task` specifies a condition with an outcome that can be used to control the execution of dependent tasks.
145144

146-
#### spark_submit_task Configuration Block
145+
* `left` - The left operand of the condition task. It could be a string value, job state, or a parameter reference.
146+
* `right` - The right operand of the condition task. It could be a string value, job state, or parameter reference.
147+
* `op` - The string specifying the operation used to compare operands. Currently, following operators are supported: `EQUAL_TO`, `GREATER_THAN`, `GREATER_THAN_OR_EQUAL`, `LESS_THAN`, `LESS_THAN_OR_EQUAL`, `NOT_EQUAL`. (Check the [API docs](https://docs.databricks.com/api/workspace/jobs/create) for the latest information).
147148

148-
You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` specification, `libraries` and `spark_conf` are not supported**. Instead, use --jars and --py-files to add Java and Python libraries and `--conf` to set the Spark configuration. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set `--driver-memory`, and `--executor-memory` to a smaller value to leave some room for off-heap usage. **Please use `spark_jar_task`, `spark_python_task` or `notebook_task` wherever possible**.
149+
This task does not require a cluster to execute and does not support retries or notifications.
149150

150-
* `parameters` - (Optional) (List) Command-line parameters passed to spark submit.
151+
#### dbt_task Configuration Block
151152

152-
#### spark_python_task Configuration Block
153+
* `commands` - (Required) (Array) Series of dbt commands to execute in sequence. Every command must start with "dbt".
154+
* `source` - (Optional) The source of the project. Possible values are `WORKSPACE` and `GIT`. Defaults to `GIT` if a `git_source` block is present in the job definition.
155+
* `project_directory` - (Required when `source` is `WORKSPACE`) The path where dbt should look for `dbt_project.yml`. Equivalent to passing `--project-dir` to the dbt CLI.
156+
* If `source` is `GIT`: Relative path to the directory in the repository specified in the `git_source` block. Defaults to the repository's root directory when not specified.
157+
* If `source` is `WORKSPACE`: Absolute path to the folder in the workspace.
158+
* `profiles_directory` - (Optional) The relative path to the directory in the repository specified by `git_source` where dbt should look in for the `profiles.yml` file. If not specified, defaults to the repository's root directory. Equivalent to passing `--profile-dir` to a dbt command.
159+
* `catalog` - (Optional) The name of the catalog to use inside Unity Catalog.
160+
* `schema` - (Optional) The name of the schema dbt should run in. Defaults to `default`.
161+
* `warehouse_id` - (Optional) The ID of the SQL warehouse that dbt should execute against.
153162

154-
* `python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`), workspace paths and remote repository are supported. For Python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. For files stored in a remote repository, the path must be relative. This field is required.
155-
* `source` - (Optional) Location type of the Python file, can only be `GIT`. When set to `GIT`, the Python file will be retrieved from a Git repository defined in `git_source`.
156-
* `parameters` - (Optional) (List) Command line parameters passed to the Python file.
163+
You also need to include a `git_source` block to configure the repository that contains the dbt project.
164+
165+
#### for_each_task Configuration Block
166+
167+
* `concurrency` - (Optional) Controls the number of active iteration task runs. Default is 20, maximum allowed is 100.
168+
* `inputs` - (Required) (String) Array for task to iterate on. This can be a JSON string or a reference to an array parameter.
169+
* `task` - (Required) Task to run against the `inputs` list.
157170

158171
#### notebook_task Configuration Block
159172

@@ -176,40 +189,27 @@ You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` s
176189
* `parameters` - (Optional) Parameters for the task
177190
* `named_parameters` - (Optional) Named parameters for the task
178191

179-
#### dbt_task Configuration Block
180-
181-
* `commands` - (Required) (Array) Series of dbt commands to execute in sequence. Every command must start with "dbt".
182-
* `source` - (Optional) The source of the project. Possible values are `WORKSPACE` and `GIT`. Defaults to `GIT` if a `git_source` block is present in the job definition.
183-
* `project_directory` - (Required when `source` is `WORKSPACE`) The path where dbt should look for `dbt_project.yml`. Equivalent to passing `--project-dir` to the dbt CLI.
184-
* If `source` is `GIT`: Relative path to the directory in the repository specified in the `git_source` block. Defaults to the repository's root directory when not specified.
185-
* If `source` is `WORKSPACE`: Absolute path to the folder in the workspace.
186-
* `profiles_directory` - (Optional) The relative path to the directory in the repository specified by `git_source` where dbt should look in for the `profiles.yml` file. If not specified, defaults to the repository's root directory. Equivalent to passing `--profile-dir` to a dbt command.
187-
* `catalog` - (Optional) The name of the catalog to use inside Unity Catalog.
188-
* `schema` - (Optional) The name of the schema dbt should run in. Defaults to `default`.
189-
* `warehouse_id` - (Optional) The ID of the SQL warehouse that dbt should execute against.
190-
191-
You also need to include a `git_source` block to configure the repository that contains the dbt project.
192-
193192
#### run_job_task Configuration Block
194193

195194
* `job_id` - (Required)(String) ID of the job
196195
* `job_parameters` - (Optional)(Map) Job parameters for the task
197196

198-
#### condition_task Configuration Block
197+
#### spark_jar_task Configuration Block
199198

200-
The `condition_task` specifies a condition with an outcome that can be used to control the execution of dependent tasks.
199+
* `parameters` - (Optional) (List) Parameters passed to the main method.
200+
* `main_class_name` - (Optional) The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use `SparkContext.getOrCreate` to obtain a Spark context; otherwise, runs of the job will fail.
201201

202-
* `left` - The left operand of the condition task. It could be a string value, job state, or a parameter reference.
203-
* `right` - The right operand of the condition task. It could be a string value, job state, or parameter reference.
204-
* `op` - The string specifying the operation used to compare operands. Currently, following operators are supported: `EQUAL_TO`, `GREATER_THAN`, `GREATER_THAN_OR_EQUAL`, `LESS_THAN`, `LESS_THAN_OR_EQUAL`, `NOT_EQUAL`. (Check the [API docs](https://docs.databricks.com/api/workspace/jobs/create) for the latest information).
202+
#### spark_python_task Configuration Block
205203

206-
This task does not require a cluster to execute and does not support retries or notifications.
204+
* `python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`), workspace paths and remote repository are supported. For Python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. For files stored in a remote repository, the path must be relative. This field is required.
205+
* `source` - (Optional) Location type of the Python file, can only be `GIT`. When set to `GIT`, the Python file will be retrieved from a Git repository defined in `git_source`.
206+
* `parameters` - (Optional) (List) Command line parameters passed to the Python file.
207207

208-
#### for_each_task Configuration Block
208+
#### spark_submit_task Configuration Block
209209

210-
* `concurrency` - (Optional) Controls the number of active iteration task runs. Default is 20, maximum allowed is 100.
211-
* `inputs` - (Required) (String) Array for task to iterate on. This can be a JSON string or a reference to an array parameter.
212-
* `task` - (Required) Task to run against the `inputs` list.
210+
You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` specification, `libraries` and `spark_conf` are not supported**. Instead, use --jars and --py-files to add Java and Python libraries and `--conf` to set the Spark configuration. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set `--driver-memory`, and `--executor-memory` to a smaller value to leave some room for off-heap usage. **Please use `spark_jar_task`, `spark_python_task` or `notebook_task` wherever possible**.
211+
212+
* `parameters` - (Optional) (List) Command-line parameters passed to spark submit.
213213

214214
#### sql_task Configuration Block
215215

0 commit comments

Comments
 (0)