|
1 | | -# ⚠️ Note |
| 1 | +# Databricks extension for Visual Studio Code |
2 | 2 |
|
3 | | -> **The quickstart guide for `2.*.*` version of the extension is still work-in-progress. For now, please use the documentation we provided to you. Please reach out to your Databricks representative if you don't have access to the relevant docs.** |
4 | | -
|
5 | | -# Databricks Extension for Visual Studio Code |
6 | | - |
7 | | -The Databricks extension for VS Code allows you to develop for the Databricks Lakehouse platform from VS Code. |
| 3 | +The Databricks extension for Visual Studio Code enables you to connect to your remote Databricks workspaces from Visual Studio Code. |
8 | 4 |
|
9 | 5 | > 📘 **Note**: The [User Guide](https://docs.databricks.com/dev-tools/vscode-ext.html) contains comprehesive documentation about the Databricks extension. |
10 | 6 |
|
11 | 7 | # Features |
12 | 8 |
|
13 | | -- Synchronize code to a Databricks workspace |
14 | | -- Run Python files on a Databricks cluster |
15 | | -- Run notebooks and Python files as Workflows |
| 9 | +- Define, deploy, and run Databricks Asset Bundles to apply CI/CD patterns to your Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks. |
| 10 | +- Run local Python code files on Databricks clusters. |
| 11 | +- Run notebooks and local Python code files as Databricks jobs. |
| 12 | +- Set up and configure your debugging environment and Databricks Connect using a simple checklist that triggers selection dialogs. |
| 13 | +- Debug notebooks cell by cell with Databricks Connect. |
| 14 | +- Synchronize local code with code in your Databricks workspace. |
16 | 15 |
|
17 | 16 | ## <a id="toc"></a>Table of Contents |
18 | 17 |
|
19 | 18 | - [Getting Started](#setup-steps) |
20 | | - - [Configure Extension](#configure-extension) |
21 | | - - [Running Code](#running-code) |
22 | | - - [Running PySpark Code](#running-pyspark-code) |
23 | | - - [Running PySpark Code and Notebooks as Workflows](#running-code-as-workflows) |
24 | | - - [Advanced: Running using custom run configurations](#run-configurations) |
25 | | -- [Extension Settings](#settings) |
26 | | -- [`Databricks:` Commands](#commands) |
| 19 | + - [Create a Databricks project](#create-databricks-project) |
| 20 | + - [Run Python code](#running-code) |
| 21 | + - [Running Python files](#running-pyspark-code) |
| 22 | + - [Running Notebooks as Workflows](#running-code-as-workflows) |
| 23 | + - [Debugging and running Notebooks cell-by-cell using Databricks Connect](#running-notebook) |
| 24 | + - [Deploying Databricks Asset Bundles](#dabs) |
| 25 | + - [What are Databricks Asset Bundles?](#what-is-dab) |
| 26 | + - [Deploying Databricks Asset Bundles](#deploy-dab) |
| 27 | + - [Run a Job or Pipeline](#deploy-run-job-pipeline) |
| 28 | +- [Changes from v1](#changes-from-v1) |
| 29 | + - [Migrate a project from Databricks extension v1 to v2](#migrate-from-v1) |
| 30 | + - [What is databricks.yml?](#what-is-databricksyml) |
| 31 | + - [No environment variables in terminals](#no-env-vars) |
27 | 32 |
|
28 | 33 | --- |
29 | 34 |
|
30 | 35 | # <a id="setup-steps"></a>Getting Started |
31 | 36 |
|
32 | | -## <a id="configure-extension"></a>Configure Extension |
| 37 | +## <a id="create-databricks-project"></a>Create a Databricks project |
33 | 38 |
|
34 | | -1. Open the Databricks panel by clicking on the Databricks icon on the left |
35 | | -2. Click the "Configure Databricks" button |
36 | | -3. Follow the wizard to select or configure a CLI profile |
37 | | -4. Click the "gear" icon in the clusters tree item to select an interactive cluster for running code on |
38 | | - 1. You can also select the first entry in the list to create a new cluster. Selecting the item will take you into the Databricks web application. |
39 | | - 2. We recommend creating a Personal Compute Cluster. |
40 | | -5. Click the "gear" icon in the Repo tree item to select a repo to sync code to |
41 | | - 1. You can also select the first entry in the list to create a new Databricks repo |
| 39 | +1. Open the Databricks extension panel by clicking on the Databricks icon on the left sidebar. |
| 40 | +2. Click on the "Create a new Databricks project" button. |
| 41 | +3. Follow the selection dialogs to create a Databricks configuration profile or select an existing one. |
| 42 | +4. Select a folder to create your project in. |
| 43 | +5. Follow the selection dialogs to create a new Databricks project. |
| 44 | +6. Select the newly created project to open it, using the selector that appears. |
| 45 | +7. VS Code will reopen with the new project loaded, and the extension will automatically login using the selected profile. |
| 46 | + |
| 47 | + |
| 48 | + |
| 49 | +If your folder has multiple [Databricks Asset Bundles](#dabs), you can select which one to use by clicking "Open Existing Databricks project" button and selecting the desired project. |
| 50 | + |
| 51 | +## <a id="select-cluster"></a>Select a cluster |
42 | 52 |
|
43 | | - |
| 53 | +The extension uses an interactive cluster to run code. To select an interactive cluster: |
44 | 54 |
|
45 | | -## <a id="running-code"></a>Running Code |
| 55 | +1. Open the Databricks panel by clicking on the Databricks icon on the left |
| 56 | +2. Click on the "Select Cluster" button. |
| 57 | + - If you wish to change the selected cluster, click on the "Configure Cluster" gear icon, next to the name of the selected cluster. |
| 58 | + |
| 59 | +## <a id="running-code"></a>Run Python code |
46 | 60 |
|
47 | | -Once you have your project configured you can sync your local code to the repo and run it on a cluster. You can use the https://github.com/databricks/ide-best-practices repository as an example. |
| 61 | +Once you have your project configured you can deploy your local code to the selected Databricks workspace and run it on a cluster. |
48 | 62 |
|
49 | | -### <a id="running-pyspark-code"></a>Running PySpark code |
| 63 | +### <a id="running-pyspark-code"></a>Running Python files |
50 | 64 |
|
51 | 65 | 1. Create python file |
52 | 66 | 2. Add PySpark code to the python file. |
53 | | -3. Click the "Run" icon in the tab bar and select "Upload and Run File on Databricks" |
| 67 | +3. Click the "Databricks Run" icon in the tab bar and select "Upload and Run File on Databricks" |
54 | 68 |
|
55 | | -This will start the code synchronization and run the active python file on the configured cluster. The result is printed in the "debug" output panel. |
| 69 | +This will deploy the code to the selected Databricks workspace and run it on the cluster. The result is printed in the "debug" output panel. |
56 | 70 |
|
57 | | - |
| 71 | + |
58 | 72 |
|
59 | | -### <a id="running-code-as-workflows"></a>Running PySpark and notebooks as a Workflow |
| 73 | +### <a id="running-code-as-workflows"></a>Running Notebooks as a Workflow |
60 | 74 |
|
61 | 75 | 1. Create a python file or a python based notebook |
62 | 76 | 1. You can create a python based notebook by exporting a notebook from the Databricks web application or use a notebook that is already tracked in git, such as https://github.com/databricks/notebook-best-practices |
63 | | -2. Click the "Run" icon in the tab bar and select "Run File as Workflow on Databricks" |
| 77 | +2. Click the "Databricks Run" icon in the tab bar and select "Run File as Workflow on Databricks" |
64 | 78 |
|
65 | 79 | This will run the file using the Jobs API on the configured cluster and render the result in a WebView. |
66 | 80 |
|
67 | | -### <a id="run-configurations"></a>Advanced: Running using custom run configurations |
| 81 | + |
| 82 | + |
| 83 | +### <a id="running-notebook"></a>Debugging and running Notebooks cell-by-cell using Databricks Connect |
| 84 | + |
| 85 | +The extension provides easy setup for cell-by-cell running and debugging notebooks locally using Databricks Connect. For more details on how to set up Databricks Connect, refer to the [full docs](https://docs.databricks.com/en/dev-tools/vscode-ext/notebooks.html). |
| 86 | + |
| 87 | +## <a id="dabs"></a>Deploying Databricks Asset Bundles |
| 88 | + |
| 89 | +### <a id="what-is-dab"></a>What are Databricks Asset Bundles? |
| 90 | + |
| 91 | +Databricks Asset Bundles make it possible to describe Databricks resources such as jobs, pipelines, and notebooks as source files. These source files provide an end-to-end definition of a project, including how it should be structured, tested, and deployed, which makes it easier to collaborate on projects during active development. For more information, see [Databricks Asset Bundles](https://docs.databricks.com/en/dev-tools/bundles/index.html). |
| 92 | + |
| 93 | +### <a id="deploy-dab"></a>Deploying Databricks Asset Bundles? |
| 94 | + |
| 95 | +1. In the Databricks extension panel, find the "Bundle Resource Explorer" view. |
| 96 | +2. Click on the "Deploy" button. |
| 97 | +3. You can monitor the deployment status in the log output window. |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +### <a id="deploy-run-job-pipeline"></a>Run a Job or Pipeline |
| 102 | + |
| 103 | +You can run a job or a pipeline managed by Databricks Asset Bundles, from the "Bundle Resource Explorer" view. |
| 104 | + |
| 105 | +1. In the Databricks extension panel, find the "Bundle Resource Explorer" view. |
| 106 | +2. Hover over the job or pipeline that you want to run. |
| 107 | +3. Click on the "Run" button. |
| 108 | + |
| 109 | +This deploys the bundle and runs the job or pipeline. You can monitor the run progress in the output terminal window. You can also open the run, job or pipeline in workspace by clicking on the "Open link externally" button. |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +#### Use the interactive cluster for running jobs |
| 114 | + |
| 115 | +By default, a job is run using a jobs cluster. You can change this behavior and use the interactive cluster selected previously ([Select a cluster](#select-cluster)) to run the job. |
| 116 | + |
| 117 | +1. In the Databricks extension panel, find the "Configuration" view. |
| 118 | +2. Check the "Override Jobs cluster in bundle" checkbox. |
68 | 119 |
|
69 | | -Both ways of running code on a cluster are also available in custom run configurations. In the "Run and Debug" panel you can click "Add configuration..." and select either "Databricks: Launch" or "Databricks: Launch as Workflow". Using run configuration you can also pass in command line arguments and run your code by simply pressing `F5`. |
| 120 | +# <a id="changes-from-v1"></a> Key behavior changes for users of Databricks extension v1 |
70 | 121 |
|
71 | | - |
| 122 | +## <a id="migrate-from-v1"></a>Migrate a project from Databricks extension v1 to v2 |
72 | 123 |
|
73 | | -## <a id="settings"></a>Extension Settings |
| 124 | +If you are using Databricks extension v1, your project will automatically be migrated a [Databricks Asset Bundle](#what-is-dab) when you open it in v2. The migration process will create a new [`databricks.yml`](#what-is-databricksyml) file in the root of your project and move the configurations from the old `.databricks/project.json` to the new `databricks.yml` file. |
74 | 125 |
|
75 | | -This extension contributes the following settings: |
| 126 | +> **Note**: This means that you will start seeing a `databricks.yml` file in your project root directory and in your version control system change logs. We recommend comitting this file to your version control system. |
76 | 127 |
|
77 | | -- `databricks.logs.maxFieldLength`: The maximum length of each field displayed in logs outputs panel |
78 | | -- `databricks.logs.truncationDepth`: The max depth of logs to show without truncation |
79 | | -- `databricks.logs.maxArrayLength`: The maximum number of items to show for array fields |
80 | | -- `databricks.logs.enabled`: Enable/disable logging. Reload window for changes to take effect |
81 | | -- `databricks.clusters.onlyShowAccessibleClusters`: Only show clusters that the user has access to |
| 128 | +## <a id="what-is-databricksyml"></a>What is databricks.yml? |
82 | 129 |
|
83 | | -## <a id="commands"></a>`Databricks:` Commands |
| 130 | +A `databricks.yml` file is a configuration file that describes a bundle. It contains the configuration such as the workspace host and definitions of resources such as jobs and pipelines. For more information on `databricks.yml`, refer to the [full docs](https://docs.databricks.com/en/dev-tools/bundles/index.html). |
84 | 131 |
|
85 | | -The Databricks extension provides commands (prefixed with `Databricks:`) to the VS Code _command |
86 | | -palette_, available by selecting _View > Command Palette_ or by typing |
87 | | -`CTRL-SHIFT-p` (macOS: `CMD-SHIFT-p`). |
| 132 | +## <a id="no-env-vars"></a>No environment variables in terminals |
88 | 133 |
|
89 | | -| Databricks Command | Description | |
90 | | -| :----------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------- | |
91 | | -| `Databricks: Configure workspace` | Configure the Databricks workspace to use for the current project | |
92 | | -| `Databricks: Logout` | Logs you out from your Databricks workspace | |
93 | | -| `Databricks: Configure cluster` | Select an interactive cluster to use for running PySpark code in this project | |
94 | | -| `Databricks: Detach cluster` | Detach configured cluster | |
95 | | -| `Databricks: Configure sync destination` | Configure target directory for synchronizing code to the configured Databricks workspace | |
96 | | -| `Databricks: Detach sync destination` | Detach the configured sync destination | |
97 | | -| `Databricks: Start synchronization` | Start synchronizing local code to the Databricks workspace. This command performs an incremental sync. | |
98 | | -| `Databricks: Start synchronization (full sync)` | Start synchronizing local code to the Databricks workspace. This command performs full sync even if an incremental sync is possible. | |
99 | | -| `Databricks: Stop synchronization` | Stop sync process. | |
100 | | -| `Databricks: Upload and Run File on Databricks` | Runs the selected Python file on the configured Databricks cluster | |
101 | | -| `Databricks: Run File as Workflow on Databricks` | Runs the selected Python file as a Workflow in the configured Databricks cluster | |
102 | | -| `Databricks: Show Quickstart` | Show the Quickstart panel | |
103 | | -| `Databricks: Open Databricks configuration file` | Opens the Databricks configuration file for the current project | |
104 | | -| `Databricks: Open full log` | Opens the log output folder for the current project | |
| 134 | +Environment variables in terminals is no longer supported. If you were using environment variables in v1, you will need to manually load the `.databricks/.databricks.env` file in your terminal before running any commands. |
0 commit comments