Skip to content

Commit a1a9af7

Browse files
authored
Update transform-data-databricks-job.md
updated some of the technical details
1 parent 0ec40a6 commit a1a9af7

File tree

1 file changed

+6
-83
lines changed

1 file changed

+6
-83
lines changed

articles/data-factory/transform-data-databricks-job.md

Lines changed: 6 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -43,16 +43,10 @@ Here's the sample JSON definition of a Databricks Job Activity:
4343
"type": "LinkedServiceReference"
4444
},
4545
"typeProperties": {
46-
"jobPath": "/Users/[email protected]/ScalaExampleJob",
47-
"baseParameters": {
48-
"inputpath": "input/folder1/",
49-
"outputpath": "output/"
46+
"jobID": "012345678910112",
47+
"jobParameters": {
48+
"testParameter": "testValue"
5049
},
51-
"libraries": [
52-
{
53-
"jar": "dbfs:/docs/library.jar"
54-
}
55-
]
5650
}
5751
}
5852
}
@@ -69,82 +63,11 @@ definition:
6963
|description|Text describing what the activity does.|No|
7064
|type|For Databricks Job Activity, the activity type is DatabricksJob.|Yes|
7165
|linkedServiceName|Name of the Databricks Linked Service on which the Databricks job runs. To learn about this linked service, see [Compute linked services](compute-linked-services.md) article.|Yes|
72-
|jobPath|The absolute path of the job to be run in the Databricks Workspace. This path must begin with a slash.|Yes|
73-
|baseParameters|An array of Key-Value pairs. Base parameters can be used for each activity run. If the job takes a parameter that isn't specified, the default value from the job will be used. Find more on parameters in [Databricks Jobs](https://docs.databricks.com/api/latest/jobs.html#jobsparampair).|No|
74-
|libraries|A list of libraries to be installed on the cluster that will execute the job. It can be an array of \<string, object>.|No|
66+
|jobId|The id of the job to be run in the Databricks Workspace.|Yes|
67+
|jobParameters|An array of Key-Value pairs. Job parameters can be used for each activity run. If the job takes a parameter that isn't specified, the default value from the job will be used. Find more on parameters in [Databricks Jobs](https://docs.databricks.com/api/latest/jobs.html#jobsparampair).|No|
7568

76-
## Supported libraries for Databricks activities
77-
78-
In the above Databricks activity definition, you specify these library types: *jar*, *egg*, *whl*, *maven*, *pypi*, *cran*.
79-
80-
```json
81-
{
82-
"libraries": [
83-
{
84-
"jar": "dbfs:/mnt/libraries/library.jar"
85-
},
86-
{
87-
"egg": "dbfs:/mnt/libraries/library.egg"
88-
},
89-
{
90-
"whl": "dbfs:/mnt/libraries/mlflow-0.0.1.dev0-py2-none-any.whl"
91-
},
92-
{
93-
"whl": "dbfs:/mnt/libraries/wheel-libraries.wheelhouse.zip"
94-
},
95-
{
96-
"maven": {
97-
"coordinates": "org.jsoup:jsoup:1.7.2",
98-
"exclusions": [ "slf4j:slf4j" ]
99-
}
100-
},
101-
{
102-
"pypi": {
103-
"package": "simplejson",
104-
"repo": "http://my-pypi-mirror.com"
105-
}
106-
},
107-
{
108-
"cran": {
109-
"package": "ada",
110-
"repo": "https://cran.us.r-project.org"
111-
}
112-
}
113-
]
114-
}
115-
116-
```
117-
118-
For more information, see the [Databricks documentation](/azure/databricks/dev-tools/api/latest/libraries#managedlibrarieslibrary) for library types.
11969

12070
## Passing parameters between jobs and pipelines
12171

122-
You can pass parameters to jobs using *baseParameters* property in databricks activity.
123-
124-
In certain cases, you might require to pass back certain values from job back to the service, which can be used for control flow (conditional checks) in the service or be consumed by downstream activities (size limit is 2 MB).
125-
126-
1. In your job, you can call `dbutils.job.exit("returnValue")` and corresponding "returnValue" will be returned to the service.
127-
128-
1. You can consume the output in the service by using expression such as `@{activity('databricks job activity name').output.runOutput}`.
129-
130-
> [!IMPORTANT]
131-
> If you're passing JSON object, you can retrieve values by appending property names. Example: `@{activity('databricks job activity name').output.runOutput.PropertyName}`
132-
133-
## How to upload a library in Databricks
134-
135-
### You can use the Workspace UI:
136-
137-
1. [Use the Databricks workspace UI](/azure/databricks/libraries/cluster-libraries#install-a-library-on-a-cluster)
138-
139-
2. To obtain the dbfs path of the library added using UI, you can use [Databricks CLI](/azure/databricks/dev-tools/cli/fs-commands#list-the-contents-of-a-directory).
140-
141-
Typically the Jar libraries are stored under dbfs:/FileStore/jars while using the UI. You can list all through the CLI: *databricks fs ls dbfs:/FileStore/job-jars*
142-
143-
### Or you can use the Databricks CLI:
144-
145-
1. Follow [Copy the library using Databricks CLI](/azure/databricks/dev-tools/cli/fs-commands#copy-a-directory-or-a-file)
146-
147-
2. Use Databricks CLI [(installation steps)](/azure/databricks/dev-tools/cli/commands#compute-commands)
72+
You can pass parameters to jobs using *jobParameters* property in Databricks activity.
14873

149-
As an example, to copy a JAR to dbfs:
150-
`dbfs cp SparkPi-assembly-0.1.jar dbfs:/docs/sparkpi.jar`

0 commit comments

Comments
 (0)