Skip to content

B. Detailed bdc Usage

Brian Clapper edited this page Oct 28, 2019 · 1 revision

Just invoke bdc with no arguments for a quick usage message.

bdc can be invoked several different ways. Each is described below.

Table of Contents

Getting the abbreviated usage message

Invoke bdc with no arguments to get a quick usage message.

Getting the full usage message

bdc -h or bdc --help

Show only the version

bdc --version

Check your build.yaml for errors

Running bdc --check against a build.yaml file parses the file and checks it for obvious problems, without actually doing anything else.

bdc performs that same validation automatically, when you run a build or use --upload or --download. But --check lets you force a validation check.

Get a list of the notebooks in a course

bdc --list-notebooks [build-yaml]

With this command, bdc will list the full paths of all the (source) notebooks that comprise a particular course, one per line. build-yaml is the path to the course's build.yaml file, and it defaults to build.yaml in the current directory.

Build a course

bdc [-o | --overwrite] [-v | --verbose] [-d DEST | --dest DEST] [build-yaml]

This version of the command builds a course, writing the results to the specified destination directory, DEST. If the destination directory doesn't exist, it defaults to $HOME/tmp/curriculum/<course-id> (e.g., $HOME/tmp/curriculum/Spark-100-105-1.8.11).

If the destination directory already exists, the build will fail unless you also specify -o (or --overwrite).

If you specify -v (--verbose), the build process will emit various verbose messages as it builds the course.

build-yaml is the path to the course's build.yaml file, and it defaults to build.yaml in the current directory.

Upload course notebooks to a Databricks shard

You can use bdc to upload all notebooks for a course to a Databricks shard.

bdc --upload shard-path [build-yaml]

Or, if you want to use a different databricks authentication profile than DEFAULT:

bdc --upload --dprofile profile shard-path [build-yaml]

--dbprofile (or -P) corresponds directly to the databricks command's --profile argument.

This version of the command gets the list of source notebooks from the build file and uploads them to a shard using a layout similar to the build layout. You can then edit and test the notebooks in Databricks. When you're done editing, you can use bdc to download the notebooks again. (See below.)

shard-path is the path to the folder on the Databricks shard. For instance: /Users/[email protected]/Spark-ML-301. The folder must not exist in the shard. If it already exists, the upload will abort.

shard-path can be relative to your home directory. See Relative Shard Paths, below.

build-yaml is the path to the course's build.yaml file, and it defaults to build.yaml in the current directory.

Uploads and build profiles: If two notebooks with separate profiles ("amazon" and "azure") map to the same dest value, bdc would overwrite one of them during the upload and would arbitrarily choose one on the download. Now, it adds an "az" or "am" qualifier to the uploaded file. For instance, assume build.yaml has these two notebooks (and assume typical values in notebook_defaults):

  - src: 02-ETL-Process-Overview-az.py
    dest: ${target_lang}/02-ETL-Process-Overview.py
    only_in_profile: azure

  - src: 02-ETL-Process-Overview-am.py
    dest: ${target_lang}/02-ETL-Process-Overview.py
    only_in_profile: amazon

Both notebooks map to the same build destination. bdc --upload will upload 02-ETL-Process-Overview-az.py as 01-az-ETL-Process-Overview.py, and it will upload 02-ETL-Process-Overview-am.py as 01-am-ETL-Process-Overview.py.

bdc always applies the am or az prefix, if only_in_profile is specified, even if there are no destination conflicts. The prefix is placed after any numerals in the destination file name; if there are no numerals, it's placed at the beginning.

Download course notebooks to a Databricks shard

You can use bdc to download all notebooks for a course to a Databricks shard.

bdc --download shard-path [build-yaml]

Or, if you want to use a different databricks authentication profile than DEFAULT:

bdc --download --dprofile profile shard-path [build-yaml]

--dbprofile (or -P) corresponds directly to the databricks command's --profile argument.

This version of the command downloads the contents of the specified Databricks shard folder to a local temporary directory. Then, for each downloaded file, bdc uses the build.yaml file to identify the original source file and copies the downloaded file over top of the original source.

shard-path is the path to the folder on the Databricks shard. For instance: /Users/[email protected]/Spark-ML-301. The folder must exist in the shard. If it doesn't exist, the upload will abort.

shard-path can be relative to your home directory. See Relative Shard Paths, below.

build-yaml is the path to the course's build.yaml file, and it defaults to build.yaml in the current directory.

WARNING: If the build.yaml points to your cloned Git repository, ensure that everything is committed first. Don't download into a dirty Git repository. If the download fails or somehow screws things up, you want to be able to reset the Git repository to before you ran the download.

To reset your repository, use:

git reset --hard HEAD

This resets your repository back to the last-committed state.

Relative Shard Paths

--upload and --download can support relative shard paths, allowing you to specify foo, instead of /Users/[email protected]/foo, for instance. To enable relative shard paths, you must do one of the following:

Set DB_SHARD_HOME

You can set the DB_SHARD_HOME environment variable (e.g., in your ~/.bashrc) to specify your home path on the shard. For example:

export DB_SHARD_HOME=/Users/[email protected]

Add a home setting to ~/.databrickscfg

You can also add a home variable to ~/.databrickscfg, in the DEFAULT section. The Databricks CLI command will ignore it, but bdc will honor it. For example:

[DEFAULT]
host = https://trainers.cloud.databricks.com
token = lsakdjfaksjhasdfkjhaslku89iuyhasdkfhjasd
home = /Users/[email protected]
Clone this wiki locally