Skip to content

Commit fc4e459

Browse files
Update quickstart for extension v2 (#1327)
## Changes <!-- Summary of your changes that are easy to understand --> ## Tests <!-- How is this tested? --> --------- Co-authored-by: Julia Crawford (Databricks) <[email protected]>
1 parent ec5d012 commit fc4e459

File tree

6 files changed

+95
-67
lines changed

6 files changed

+95
-67
lines changed
Lines changed: 94 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,104 +1,134 @@
1-
# ⚠️ Note
1+
# Databricks extension for Visual Studio Code
22

3-
> **The quickstart guide for `2.*.*` version of the extension is still work-in-progress. For now, please use the documentation we provided to you. Please reach out to your Databricks representative if you don't have access to the relevant docs.**
4-
5-
# Databricks Extension for Visual Studio Code
6-
7-
The Databricks extension for VS Code allows you to develop for the Databricks Lakehouse platform from VS Code.
3+
The Databricks extension for Visual Studio Code enables you to connect to your remote Databricks workspaces from Visual Studio Code.
84

95
> 📘 **Note**: The [User Guide](https://docs.databricks.com/dev-tools/vscode-ext.html) contains comprehesive documentation about the Databricks extension.
106
117
# Features
128

13-
- Synchronize code to a Databricks workspace
14-
- Run Python files on a Databricks cluster
15-
- Run notebooks and Python files as Workflows
9+
- Define, deploy, and run Databricks Asset Bundles to apply CI/CD patterns to your Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks.
10+
- Run local Python code files on Databricks clusters.
11+
- Run notebooks and local Python code files as Databricks jobs.
12+
- Set up and configure your debugging environment and Databricks Connect using a simple checklist that triggers selection dialogs.
13+
- Debug notebooks cell by cell with Databricks Connect.
14+
- Synchronize local code with code in your Databricks workspace.
1615

1716
## <a id="toc"></a>Table of Contents
1817

1918
- [Getting Started](#setup-steps)
20-
- [Configure Extension](#configure-extension)
21-
- [Running Code](#running-code)
22-
- [Running PySpark Code](#running-pyspark-code)
23-
- [Running PySpark Code and Notebooks as Workflows](#running-code-as-workflows)
24-
- [Advanced: Running using custom run configurations](#run-configurations)
25-
- [Extension Settings](#settings)
26-
- [`Databricks:` Commands](#commands)
19+
- [Create a Databricks project](#create-databricks-project)
20+
- [Run Python code](#running-code)
21+
- [Running Python files](#running-pyspark-code)
22+
- [Running Notebooks as Workflows](#running-code-as-workflows)
23+
- [Debugging and running Notebooks cell-by-cell using Databricks Connect](#running-notebook)
24+
- [Deploying Databricks Asset Bundles](#dabs)
25+
- [What are Databricks Asset Bundles?](#what-is-dab)
26+
- [Deploying Databricks Asset Bundles](#deploy-dab)
27+
- [Run a Job or Pipeline](#deploy-run-job-pipeline)
28+
- [Changes from v1](#changes-from-v1)
29+
- [Migrate a project from Databricks extension v1 to v2](#migrate-from-v1)
30+
- [What is databricks.yml?](#what-is-databricksyml)
31+
- [No environment variables in terminals](#no-env-vars)
2732

2833
---
2934

3035
# <a id="setup-steps"></a>Getting Started
3136

32-
## <a id="configure-extension"></a>Configure Extension
37+
## <a id="create-databricks-project"></a>Create a Databricks project
3338

34-
1. Open the Databricks panel by clicking on the Databricks icon on the left
35-
2. Click the "Configure Databricks" button
36-
3. Follow the wizard to select or configure a CLI profile
37-
4. Click the "gear" icon in the clusters tree item to select an interactive cluster for running code on
38-
1. You can also select the first entry in the list to create a new cluster. Selecting the item will take you into the Databricks web application.
39-
2. We recommend creating a Personal Compute Cluster.
40-
5. Click the "gear" icon in the Repo tree item to select a repo to sync code to
41-
1. You can also select the first entry in the list to create a new Databricks repo
39+
1. Open the Databricks extension panel by clicking on the Databricks icon on the left sidebar.
40+
2. Click on the "Create a new Databricks project" button.
41+
3. Follow the selection dialogs to create a Databricks configuration profile or select an existing one.
42+
4. Select a folder to create your project in.
43+
5. Follow the selection dialogs to create a new Databricks project.
44+
6. Select the newly created project to open it, using the selector that appears.
45+
7. VS Code will reopen with the new project loaded, and the extension will automatically login using the selected profile.
46+
47+
![create-databricks-project](./images/dabs_vsc.gif)
48+
49+
If your folder has multiple [Databricks Asset Bundles](#dabs), you can select which one to use by clicking "Open Existing Databricks project" button and selecting the desired project.
50+
51+
## <a id="select-cluster"></a>Select a cluster
4252

43-
![configure](./images/configure.gif)
53+
The extension uses an interactive cluster to run code. To select an interactive cluster:
4454

45-
## <a id="running-code"></a>Running Code
55+
1. Open the Databricks panel by clicking on the Databricks icon on the left
56+
2. Click on the "Select Cluster" button.
57+
- If you wish to change the selected cluster, click on the "Configure Cluster" gear icon, next to the name of the selected cluster.
58+
59+
## <a id="running-code"></a>Run Python code
4660

47-
Once you have your project configured you can sync your local code to the repo and run it on a cluster. You can use the https://github.com/databricks/ide-best-practices repository as an example.
61+
Once you have your project configured you can deploy your local code to the selected Databricks workspace and run it on a cluster.
4862

49-
### <a id="running-pyspark-code"></a>Running PySpark code
63+
### <a id="running-pyspark-code"></a>Running Python files
5064

5165
1. Create python file
5266
2. Add PySpark code to the python file.
53-
3. Click the "Run" icon in the tab bar and select "Upload and Run File on Databricks"
67+
3. Click the "Databricks Run" icon in the tab bar and select "Upload and Run File on Databricks"
5468

55-
This will start the code synchronization and run the active python file on the configured cluster. The result is printed in the "debug" output panel.
69+
This will deploy the code to the selected Databricks workspace and run it on the cluster. The result is printed in the "debug" output panel.
5670

57-
![run](./images/run.gif)
71+
![run-python-code](./images/cmd-exec-run.gif)
5872

59-
### <a id="running-code-as-workflows"></a>Running PySpark and notebooks as a Workflow
73+
### <a id="running-code-as-workflows"></a>Running Notebooks as a Workflow
6074

6175
1. Create a python file or a python based notebook
6276
1. You can create a python based notebook by exporting a notebook from the Databricks web application or use a notebook that is already tracked in git, such as https://github.com/databricks/notebook-best-practices
63-
2. Click the "Run" icon in the tab bar and select "Run File as Workflow on Databricks"
77+
2. Click the "Databricks Run" icon in the tab bar and select "Run File as Workflow on Databricks"
6478

6579
This will run the file using the Jobs API on the configured cluster and render the result in a WebView.
6680

67-
### <a id="run-configurations"></a>Advanced: Running using custom run configurations
81+
![run-as-workflow](./images/run-as-workflow.gif)
82+
83+
### <a id="running-notebook"></a>Debugging and running Notebooks cell-by-cell using Databricks Connect
84+
85+
The extension provides easy setup for cell-by-cell running and debugging notebooks locally using Databricks Connect. For more details on how to set up Databricks Connect, refer to the [full docs](https://docs.databricks.com/en/dev-tools/vscode-ext/notebooks.html).
86+
87+
## <a id="dabs"></a>Deploying Databricks Asset Bundles
88+
89+
### <a id="what-is-dab"></a>What are Databricks Asset Bundles?
90+
91+
Databricks Asset Bundles make it possible to describe Databricks resources such as jobs, pipelines, and notebooks as source files. These source files provide an end-to-end definition of a project, including how it should be structured, tested, and deployed, which makes it easier to collaborate on projects during active development. For more information, see [Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html).
92+
93+
### <a id="deploy-dab"></a>Deploying Databricks Asset Bundles?
94+
95+
1. In the Databricks extension panel, find the "Bundle Resource Explorer" view.
96+
2. Click on the "Deploy" button.
97+
3. You can monitor the deployment status in the log output window.
98+
99+
![deploy](./images/deploy.gif)
100+
101+
### <a id="deploy-run-job-pipeline"></a>Run a Job or Pipeline
102+
103+
You can run a job or a pipeline managed by Databricks Asset Bundles, from the "Bundle Resource Explorer" view.
104+
105+
1. In the Databricks extension panel, find the "Bundle Resource Explorer" view.
106+
2. Hover over the job or pipeline that you want to run.
107+
3. Click on the "Run" button.
108+
109+
This deploys the bundle and runs the job or pipeline. You can monitor the run progress in the output terminal window. You can also open the run, job or pipeline in workspace by clicking on the "Open link externally" button.
110+
111+
![run-job-pipeline](./images/deploy-and-run.gif)
112+
113+
#### Use the interactive cluster for running jobs
114+
115+
By default, a job is run using a jobs cluster. You can change this behavior and use the interactive cluster selected previously ([Select a cluster](#select-cluster)) to run the job.
116+
117+
1. In the Databricks extension panel, find the "Configuration" view.
118+
2. Check the "Override Jobs cluster in bundle" checkbox.
68119

69-
Both ways of running code on a cluster are also available in custom run configurations. In the "Run and Debug" panel you can click "Add configuration..." and select either "Databricks: Launch" or "Databricks: Launch as Workflow". Using run configuration you can also pass in command line arguments and run your code by simply pressing `F5`.
120+
# <a id="changes-from-v1"></a> Key behavior changes for users of Databricks extension v1
70121

71-
![configure](./images/custom-runner.gif)
122+
## <a id="migrate-from-v1"></a>Migrate a project from Databricks extension v1 to v2
72123

73-
## <a id="settings"></a>Extension Settings
124+
If you are using Databricks extension v1, your project will automatically be migrated a [Databricks Asset Bundle](#what-is-dab) when you open it in v2. The migration process will create a new [`databricks.yml`](#what-is-databricksyml) file in the root of your project and move the configurations from the old `.databricks/project.json` to the new `databricks.yml` file.
74125

75-
This extension contributes the following settings:
126+
> **Note**: This means that you will start seeing a `databricks.yml` file in your project root directory and in your version control system change logs. We recommend comitting this file to your version control system.
76127
77-
- `databricks.logs.maxFieldLength`: The maximum length of each field displayed in logs outputs panel
78-
- `databricks.logs.truncationDepth`: The max depth of logs to show without truncation
79-
- `databricks.logs.maxArrayLength`: The maximum number of items to show for array fields
80-
- `databricks.logs.enabled`: Enable/disable logging. Reload window for changes to take effect
81-
- `databricks.clusters.onlyShowAccessibleClusters`: Only show clusters that the user has access to
128+
## <a id="what-is-databricksyml"></a>What is databricks.yml?
82129

83-
## <a id="commands"></a>`Databricks:` Commands
130+
A `databricks.yml` file is a configuration file that describes a bundle. It contains the configuration such as the workspace host and definitions of resources such as jobs and pipelines. For more information on `databricks.yml`, refer to the [full docs](https://docs.databricks.com/en/dev-tools/bundles/index.html).
84131

85-
The Databricks extension provides commands (prefixed with `Databricks:`) to the VS Code _command
86-
palette_, available by selecting _View > Command Palette_ or by typing
87-
`CTRL-SHIFT-p` (macOS: `CMD-SHIFT-p`).
132+
## <a id="no-env-vars"></a>No environment variables in terminals
88133

89-
| Databricks Command | Description |
90-
| :----------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------- |
91-
| `Databricks: Configure workspace` | Configure the Databricks workspace to use for the current project |
92-
| `Databricks: Logout` | Logs you out from your Databricks workspace |
93-
| `Databricks: Configure cluster` | Select an interactive cluster to use for running PySpark code in this project |
94-
| `Databricks: Detach cluster` | Detach configured cluster |
95-
| `Databricks: Configure sync destination` | Configure target directory for synchronizing code to the configured Databricks workspace |
96-
| `Databricks: Detach sync destination` | Detach the configured sync destination |
97-
| `Databricks: Start synchronization` | Start synchronizing local code to the Databricks workspace. This command performs an incremental sync. |
98-
| `Databricks: Start synchronization (full sync)` | Start synchronizing local code to the Databricks workspace. This command performs full sync even if an incremental sync is possible. |
99-
| `Databricks: Stop synchronization` | Stop sync process. |
100-
| `Databricks: Upload and Run File on Databricks` | Runs the selected Python file on the configured Databricks cluster |
101-
| `Databricks: Run File as Workflow on Databricks` | Runs the selected Python file as a Workflow in the configured Databricks cluster |
102-
| `Databricks: Show Quickstart` | Show the Quickstart panel |
103-
| `Databricks: Open Databricks configuration file` | Opens the Databricks configuration file for the current project |
104-
| `Databricks: Open full log` | Opens the log output folder for the current project |
134+
Environment variables in terminals is no longer supported. If you were using environment variables in v1, you will need to manually load the `.databricks/.databricks.env` file in your terminal before running any commands.
1.57 MB
Loading
2.92 MB
Loading
3.2 MB
Loading
3.43 MB
Loading

packages/databricks-vscode/src/configuration/auth/DatabricksCliCheck.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,7 @@ export class DatabricksCliCheck implements Disposable {
112112
);
113113
} catch (e: any) {
114114
throw new Error(
115-
`Login failed with Databricks CLI failed: ${
116-
e.stderr || e.message
117-
}`
115+
`Login failed with Databricks CLI: ${e.stderr || e.message}`
118116
);
119117
}
120118
}

0 commit comments

Comments
 (0)