-
Notifications
You must be signed in to change notification settings - Fork 7
B. Detailed bdc Usage
Just invoke bdc
with no arguments for a quick usage message.
bdc
can be invoked several different ways. Each is described below.
- Getting the abbreviated usage message
- Getting the full usage message
- Show only the version
- Check your
build.yaml
for errors - Get a list of the notebooks in a course
- Build a course
- Upload course notebooks to a Databricks shard
- Download course notebooks to a Databricks shard
- Relative Shard Paths
Invoke bdc with no arguments to get a quick usage message.
bdc -h
or bdc --help
bdc --version
Running bdc --check
against a build.yaml
file parses the file and
checks it for obvious problems, without actually doing anything else.
bdc
performs that same validation automatically, when you run a
build or use --upload
or --download
. But --check
lets you force
a validation check.
bdc --list-notebooks [build-yaml]
With this command, bdc
will list the full paths of all the (source) notebooks
that comprise a particular course, one per line. build-yaml
is the path to
the course's build.yaml
file, and it defaults to build.yaml
in the current
directory.
bdc [-o | --overwrite] [-v | --verbose] [-d DEST | --dest DEST] [build-yaml]
This version of the command builds a course, writing the results to the
specified destination directory, DEST
. If the destination directory
doesn't exist, it defaults to $HOME/tmp/curriculum/<course-id>
(e.g.,
$HOME/tmp/curriculum/Spark-100-105-1.8.11
).
If the destination directory already exists, the build will fail unless you
also specify -o
(or --overwrite
).
If you specify -v
(--verbose
), the build process will emit various verbose
messages as it builds the course.
build-yaml
is the path to the course's build.yaml
file, and it defaults to
build.yaml
in the current directory.
You can use bdc
to upload all notebooks for a course to a Databricks shard.
bdc --upload shard-path [build-yaml]
Or, if you want to use a different databricks
authentication profile than
DEFAULT
:
bdc --upload --dprofile profile shard-path [build-yaml]
--dbprofile
(or -P
) corresponds directly to the databricks
command's --profile
argument.
This version of the command gets the list of source notebooks from the build
file and uploads them to a shard using a layout similar to the build layout.
You can then edit and test the notebooks in Databricks. When you're done
editing, you can use bdc
to download the notebooks again. (See below.)
shard-path
is the path to the folder on the Databricks shard. For instance:
/Users/[email protected]/Spark-ML-301
. The folder must not exist in the
shard. If it already exists, the upload will abort.
shard-path
can be relative to your home directory. See
Relative Shard Paths, below.
build-yaml
is the path to the course's build.yaml
file, and it defaults to
build.yaml
in the current directory.
Uploads and build profiles: If two notebooks with separate profiles
("amazon" and "azure") map to the same dest
value, bdc
would overwrite one
of them during the upload and would arbitrarily choose one on the download.
Now, it adds an "az" or "am" qualifier to the uploaded file. For instance,
assume build.yaml
has these two notebooks (and assume typical values in
notebook_defaults
):
- src: 02-ETL-Process-Overview-az.py
dest: ${target_lang}/02-ETL-Process-Overview.py
only_in_profile: azure
- src: 02-ETL-Process-Overview-am.py
dest: ${target_lang}/02-ETL-Process-Overview.py
only_in_profile: amazon
Both notebooks map to the same build destination. bdc --upload
will upload
02-ETL-Process-Overview-az.py
as 01-az-ETL-Process-Overview.py
, and it will
upload 02-ETL-Process-Overview-am.py
as 01-am-ETL-Process-Overview.py
.
bdc
always applies the am
or az
prefix, if only_in_profile
is specified,
even if there are no destination conflicts. The prefix is placed after any
numerals in the destination file name; if there are no numerals, it's placed
at the beginning.
You can use bdc
to download all notebooks for a course to a Databricks shard.
bdc --download shard-path [build-yaml]
Or, if you want to use a different databricks
authentication profile than
DEFAULT
:
bdc --download --dprofile profile shard-path [build-yaml]
--dbprofile
(or -P
) corresponds directly to the databricks
command's --profile
argument.
This version of the command downloads the contents of the specified Databricks
shard folder to a local temporary directory. Then, for each downloaded file,
bdc
uses the build.yaml
file to identify the original source file and
copies the downloaded file over top of the original source.
shard-path
is the path to the folder on the Databricks shard. For instance:
/Users/[email protected]/Spark-ML-301
. The folder must exist in the
shard. If it doesn't exist, the upload will abort.
shard-path
can be relative to your home directory. See
Relative Shard Paths, below.
build-yaml
is the path to the course's build.yaml
file, and it defaults to
build.yaml
in the current directory.
WARNING: If the build.yaml
points to your cloned Git repository,
ensure that everything is committed first. Don't download into a dirty
Git repository. If the download fails or somehow screws things up, you want to
be able to reset the Git repository to before you ran the download.
To reset your repository, use:
git reset --hard HEAD
This resets your repository back to the last-committed state.
--upload
and --download
can support relative shard paths, allowing you
to specify foo
, instead of /Users/[email protected]/foo
, for instance.
To enable relative shard paths, you must do one of the following:
Set DB_SHARD_HOME
You can set the DB_SHARD_HOME
environment variable (e.g., in your
~/.bashrc
) to specify your home path on the shard. For example:
export DB_SHARD_HOME=/Users/[email protected]
Add a home
setting to ~/.databrickscfg
You can also add a home
variable to ~/.databrickscfg
, in the DEFAULT
section. The Databricks CLI command will ignore it, but bdc
will honor it.
For example:
[DEFAULT]
host = https://trainers.cloud.databricks.com
token = lsakdjfaksjhasdfkjhaslku89iuyhasdkfhjasd
home = /Users/[email protected]
NOTICE
- This software is copyright © 2017-2021 Databricks, Inc., and is released under the Apache License, version 2.0. See LICENSE.txt in the main repository for details.
- Databricks cannot support this software for you. We use it internally, and we have released it as open source, for use by those who are interested in building similar kinds of Databricks notebook-based curriculum. But this software does not constitute an official Databricks product, and it is subject to change without notice.