-
Notifications
You must be signed in to change notification settings - Fork 7
1. Installation
- Installing official build tools releases
- Installing snapshot releases
- Switching between snapshot and release versions
-
Configuring
databricks
- Cleaning up "dangling" Docker images
The build tools are written in Python 3. They will not work with Python 2.
The only supported way to install the tools is via Docker.
Note: If you don't already have Docker installed, see Installing Docker.
To install or update the build tools, just run:
$ curl -L https://git.io/fhaLg | bash
NOTE: If you have the latest set of shell aliases installed and active, you can just type:
$ update-tools # or, update_tools
That command is equivalent to:
$ curl https://raw.githubusercontent.com/databricks-edu/build-tooling/master/docker/install.sh | bash
This command:
- Pulls down the prebuilt Docker image (
databrickseducation/build-tool:latest
) from Docker Hub. - Updates your local Docker image, if necessary.
- Pulls down the build tool aliases and installs them in
$HOME/.build-tools-aliases.sh
All you have to do is ensure that Docker is installed (see below) and that
you have this command in your .bashrc
or .zshrc
:
. ~/.build-tools-aliases.sh
That aliases file defines command aliases for bdc
, gendbc
, master_parse
,
databricks
, and course
; those aliases invoke the corresponding commands
with the Docker image.
From time to time, we push preliminary versions of the build tools to the
snapshot
branch. You can install and use a snapshot version by following
this procedure.
-
Run
update-tools snapshot
(orcurl -L https://git.io/fhaLg | bash -s snapshot
) -
Switch to the snapshot version with
dbe snapshot
. (See below.) -
Ensure that you're using the latest version of the build tool aliases. They respect that environment variable.
This procedure installs the snapshot release into a separate Docker image
(databrickseducation/build-tool:snapshot
). It will not conflict with the
installation of the release version of the build tools; that version is
always installed in databrickseducation/build-tool:latest
.
To switch back to using the release version, simply run
$ dbe latest
The aliases.sh
file installs a command called dbe
for switching between
the snapshot and release versions.
$ dbe latest # use release version
$ dbe snapshot # use snapshot version
$ dbe # display what version you're using
This command is equivalent to setting the BUILD_TOOL_DOCKER_TAG
environment variable to snapshot
or latest
.
You'll also want to configure the databricks
command, if you haven't already
done so. You don't have to install it. There's a version already installed
in the Docker image, and the shell aliases define a databricks
alias that
invokes the Docker version.
But you do have to configure it, to you can use course
or bdc
to upload
and download your notebooks. You'll need a configuration section for each
Databricks workspace you'll be using for notebook development.
For each such workspace, you'll have to
set up authentication.
The databricks
command supports username
and password
authentication,
as well as API token
authentication.
However, the build tools only support API token authentication.
Here's a sample ~/.databrickscfg
file:
[DEFAULT]
host = https://trainers.cloud.databricks.com/
token = dapi9b1bd21f3cb79f26c5103d28d667967e
[azure]
host = https://eastus2.azuredatabricks.net
token = dapi24426c433e561579a55c7a80f0f1c9c1
Note the DEFAULT
is special.
- If you don't specify a profile, when using the tools (or the
databricks
command), they all assumeDEFAULT
. - In addition, if you do specify a profile (e.g.,
azure
), any missing fields in that section of.databrickscfg
come fromDEFAULT
.
For instance, consider this example:
[DEFAULT]
host = https://trainers.cloud.databricks.com/
token = dapi9b1bd21f3cb79f26c5103d28d667967e
[azure]
host = https://eastus2.azuredatabricks.net
If you tried to invoke, say, databricks workspace ls --profile azure /
,
the databricks
command would use the host
value from the [azure]
section and the token
value from the DEFAULT
section (because token
is missing from the [azure]
section). This is probably not what you want.
You'll also want to set your Databricks home directory. Both bdc
and
course
need to know your home directory in Databricks, for various
operations. You can set this value in several ways.
-
With
course
, you can set it in thecourse
configuration, by settingDB_SHARD_HOME
. When usingcourse
, thecourse
configuration overrides all other ways of setting your home. -
You can set the
DB_SHARD_HOME
environment variable. This value takes precedence over the other methods, below. For example:
# My home directory on all Databricks instances is /Users/[email protected]
DB_SHARD_HOME=/Users/[email protected]
- Set
home
in your.databrickscfg
file. Note that only the build tools honor this value. Thedatabricks
command ignores it. Here are a couple examples:
# My home directory is different on the default workspace than on Azure.
[DEFAULT]
host = https://trainers.cloud.databricks.com/
token = dapi9b1bd21f3cb79f26c5103d28d667967e
home = /Users/[email protected]
[azure]
host = https://eastus2.azuredatabricks.net
token = dapi24426c433e561579a55c7a80f0f1c9c1
home = /Users/[email protected]
If you have the same directory on all Databricks workspaces, you can just
set it in DEFAULT
:
[DEFAULT]
host = https://trainers.cloud.databricks.com/
token = dapi9b1bd21f3cb79f26c5103d28d667967e
home = /Users/[email protected]
[azure1]
host = https://eastus2.azuredatabricks.net
token = dapi24426c433e561579a55c7a80f0f1c9c1
[azure2]
host = https://westus2.azuredatabricks.net
token = dapi704d362303f3235cfcc505d6655eea6
- Set
username
in your.databrickscfg
. If the tools can't find any of thehome
values using the methods, above, it'll look for ausername
value in the profile configuration, and it'll calculate your home directory using that value.
[DEFAULT]
host = https://trainers.cloud.databricks.com/
token = dapi9b1bd21f3cb79f26c5103d28d667967e
username = [email protected]
In this case, assuming DB_SHARD_HOME
isn't set in the course
configuration
and the environment, the build tools will assume your home directory is
/Users/[email protected]
.
Over time, as you update your Docker image, you might find you're
accumulating a bunch of dangling (stale) Docker images. If you run
docker images
, you may see a bunch with labels like <none>
.
Some of these might be stale, and stale images can consume disk space.
Consider running the following command periodically to clean things up:
$ docker rmi $(docker images -f "dangling=true" -q)
NOTICE
- This software is copyright © 2017-2021 Databricks, Inc., and is released under the Apache License, version 2.0. See LICENSE.txt in the main repository for details.
- Databricks cannot support this software for you. We use it internally, and we have released it as open source, for use by those who are interested in building similar kinds of Databricks notebook-based curriculum. But this software does not constitute an official Databricks product, and it is subject to change without notice.