You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/resources/job.md
+35-35Lines changed: 35 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,6 +113,7 @@ This block describes individual tasks:
113
113
*`*_task` - (Required) one of the specific task blocks described below:
114
114
*`condition_task`
115
115
*`dbt_task`
116
+
*`for_each_task`
116
117
*`notebook_task`
117
118
*`pipeline_task`
118
119
*`python_wheel_task`
@@ -121,7 +122,6 @@ This block describes individual tasks:
121
122
*`spark_python_task`
122
123
*`spark_submit_task`
123
124
*`sql_task`
124
-
*`for_each_task`
125
125
*`library` - (Optional) (Set) An optional list of libraries to be installed on the cluster that will execute the job.
126
126
*`depends_on` - (Optional) block specifying dependency(-ies) for a given task.
127
127
*`job_cluster_key` - (Optional) Identifier of the Job cluster specified in the `job_cluster` block.
@@ -138,22 +138,35 @@ This block describes individual tasks:
138
138
139
139
-> **Note** If no `job_cluster_key`, `existing_cluster_id`, or `new_cluster` were specified in task definition, then task will executed using serverless compute.
140
140
141
-
#### spark_jar_task Configuration Block
141
+
#### condition_task Configuration Block
142
142
143
-
*`parameters` - (Optional) (List) Parameters passed to the main method.
144
-
*`main_class_name` - (Optional) The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use `SparkContext.getOrCreate` to obtain a Spark context; otherwise, runs of the job will fail.
143
+
The `condition_task` specifies a condition with an outcome that can be used to control the execution of dependent tasks.
145
144
146
-
#### spark_submit_task Configuration Block
145
+
*`left` - The left operand of the condition task. It could be a string value, job state, or a parameter reference.
146
+
*`right` - The right operand of the condition task. It could be a string value, job state, or parameter reference.
147
+
*`op` - The string specifying the operation used to compare operands. Currently, following operators are supported: `EQUAL_TO`, `GREATER_THAN`, `GREATER_THAN_OR_EQUAL`, `LESS_THAN`, `LESS_THAN_OR_EQUAL`, `NOT_EQUAL`. (Check the [API docs](https://docs.databricks.com/api/workspace/jobs/create) for the latest information).
147
148
148
-
You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` specification, `libraries` and `spark_conf` are not supported**. Instead, use --jars and --py-files to add Java and Python libraries and `--conf` to set the Spark configuration. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set `--driver-memory`, and `--executor-memory` to a smaller value to leave some room for off-heap usage. **Please use `spark_jar_task`, `spark_python_task`or `notebook_task` wherever possible**.
149
+
This task does not require a cluster to execute and does not support retries or notifications.
149
150
150
-
*`parameters` - (Optional) (List) Command-line parameters passed to spark submit.
151
+
#### dbt_task Configuration Block
151
152
152
-
#### spark_python_task Configuration Block
153
+
*`commands` - (Required) (Array) Series of dbt commands to execute in sequence. Every command must start with "dbt".
154
+
*`source` - (Optional) The source of the project. Possible values are `WORKSPACE` and `GIT`. Defaults to `GIT` if a `git_source` block is present in the job definition.
155
+
*`project_directory` - (Required when `source` is `WORKSPACE`) The path where dbt should look for `dbt_project.yml`. Equivalent to passing `--project-dir` to the dbt CLI.
156
+
* If `source` is `GIT`: Relative path to the directory in the repository specified in the `git_source` block. Defaults to the repository's root directory when not specified.
157
+
* If `source` is `WORKSPACE`: Absolute path to the folder in the workspace.
158
+
*`profiles_directory` - (Optional) The relative path to the directory in the repository specified by `git_source` where dbt should look in for the `profiles.yml` file. If not specified, defaults to the repository's root directory. Equivalent to passing `--profile-dir` to a dbt command.
159
+
*`catalog` - (Optional) The name of the catalog to use inside Unity Catalog.
160
+
*`schema` - (Optional) The name of the schema dbt should run in. Defaults to `default`.
161
+
*`warehouse_id` - (Optional) The ID of the SQL warehouse that dbt should execute against.
153
162
154
-
*`python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`), workspace paths and remote repository are supported. For Python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. For files stored in a remote repository, the path must be relative. This field is required.
155
-
*`source` - (Optional) Location type of the Python file, can only be `GIT`. When set to `GIT`, the Python file will be retrieved from a Git repository defined in `git_source`.
156
-
*`parameters` - (Optional) (List) Command line parameters passed to the Python file.
163
+
You also need to include a `git_source` block to configure the repository that contains the dbt project.
164
+
165
+
#### for_each_task Configuration Block
166
+
167
+
*`concurrency` - (Optional) Controls the number of active iteration task runs. Default is 20, maximum allowed is 100.
168
+
*`inputs` - (Required) (String) Array for task to iterate on. This can be a JSON string or a reference to an array parameter.
169
+
*`task` - (Required) Task to run against the `inputs` list.
157
170
158
171
#### notebook_task Configuration Block
159
172
@@ -176,40 +189,27 @@ You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` s
176
189
*`parameters` - (Optional) Parameters for the task
177
190
*`named_parameters` - (Optional) Named parameters for the task
178
191
179
-
#### dbt_task Configuration Block
180
-
181
-
*`commands` - (Required) (Array) Series of dbt commands to execute in sequence. Every command must start with "dbt".
182
-
*`source` - (Optional) The source of the project. Possible values are `WORKSPACE` and `GIT`. Defaults to `GIT` if a `git_source` block is present in the job definition.
183
-
*`project_directory` - (Required when `source` is `WORKSPACE`) The path where dbt should look for `dbt_project.yml`. Equivalent to passing `--project-dir` to the dbt CLI.
184
-
* If `source` is `GIT`: Relative path to the directory in the repository specified in the `git_source` block. Defaults to the repository's root directory when not specified.
185
-
* If `source` is `WORKSPACE`: Absolute path to the folder in the workspace.
186
-
*`profiles_directory` - (Optional) The relative path to the directory in the repository specified by `git_source` where dbt should look in for the `profiles.yml` file. If not specified, defaults to the repository's root directory. Equivalent to passing `--profile-dir` to a dbt command.
187
-
*`catalog` - (Optional) The name of the catalog to use inside Unity Catalog.
188
-
*`schema` - (Optional) The name of the schema dbt should run in. Defaults to `default`.
189
-
*`warehouse_id` - (Optional) The ID of the SQL warehouse that dbt should execute against.
190
-
191
-
You also need to include a `git_source` block to configure the repository that contains the dbt project.
192
-
193
192
#### run_job_task Configuration Block
194
193
195
194
*`job_id` - (Required)(String) ID of the job
196
195
*`job_parameters` - (Optional)(Map) Job parameters for the task
197
196
198
-
#### condition_task Configuration Block
197
+
#### spark_jar_task Configuration Block
199
198
200
-
The `condition_task` specifies a condition with an outcome that can be used to control the execution of dependent tasks.
199
+
*`parameters` - (Optional) (List) Parameters passed to the main method.
200
+
*`main_class_name` - (Optional) The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use `SparkContext.getOrCreate` to obtain a Spark context; otherwise, runs of the job will fail.
201
201
202
-
*`left` - The left operand of the condition task. It could be a string value, job state, or a parameter reference.
203
-
*`right` - The right operand of the condition task. It could be a string value, job state, or parameter reference.
204
-
*`op` - The string specifying the operation used to compare operands. Currently, following operators are supported: `EQUAL_TO`, `GREATER_THAN`, `GREATER_THAN_OR_EQUAL`, `LESS_THAN`, `LESS_THAN_OR_EQUAL`, `NOT_EQUAL`. (Check the [API docs](https://docs.databricks.com/api/workspace/jobs/create) for the latest information).
202
+
#### spark_python_task Configuration Block
205
203
206
-
This task does not require a cluster to execute and does not support retries or notifications.
204
+
*`python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`), workspace paths and remote repository are supported. For Python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. For files stored in a remote repository, the path must be relative. This field is required.
205
+
*`source` - (Optional) Location type of the Python file, can only be `GIT`. When set to `GIT`, the Python file will be retrieved from a Git repository defined in `git_source`.
206
+
*`parameters` - (Optional) (List) Command line parameters passed to the Python file.
207
207
208
-
#### for_each_task Configuration Block
208
+
#### spark_submit_task Configuration Block
209
209
210
-
*`concurrency` - (Optional) Controls the number of active iteration task runs. Default is 20, maximum allowed is 100.
211
-
*`inputs` - (Required) (String) Array for task to iterate on. This can be a JSON string or a reference to an array parameter.
212
-
*`task` - (Required) Task to run against the `inputs` list.
210
+
You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` specification, `libraries` and `spark_conf` are not supported**. Instead, use --jars and --py-files to add Java and Python libraries and `--conf` to set the Spark configuration. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set `--driver-memory`, and `--executor-memory` to a smaller value to leave some room for off-heap usage. **Please use `spark_jar_task`, `spark_python_task` or `notebook_task` wherever possible**.
211
+
212
+
*`parameters` - (Optional) (List) Command-line parameters passed to spark submit.
0 commit comments