-
Notifications
You must be signed in to change notification settings - Fork 7
2. The Build File
bdc uses a per-course build file that describes the course being built. This
file, conventionally called build.yaml
, is a YAML file describing the files
that comprise a particular class. Each class that is to be built will have its
own build file.
For a template build file, see the sample build.yaml in the build tooling repository.
This page describes each section and configuration item in the build file.
- Include file preprocessing
- Tool versions
-
Introduction to variable substitution, plus the
variables
section - Course info
- A Note about Documents
- Output Generation
- Build Profiles
- Notebooks
- Bundles
- Variable Substitution
- Basic Mustache Syntax
As of version 1.30.0, bdc
supports an include directive in build.yaml
.
The syntax is:
#include "path"
This feature allows you to break your build.yaml
into easily composed
pieces.
Note:
- The includes are processed before the YAML is parsed.
- Includes can be nested. That is, a file you include can, itself, include other files, up to a maximum nesting level of 100.
Consider this build.yaml
:
course_info:
name: Dummy
title: "Dummy course"
version: 1.1.0
type: self-paced
#include "../common/cfg.yaml"
notebooks:
- src: 01-Introduction.py
- src: 02-Create.py
- src: 03-Append.py
- src: 04-Upsert.py
- src: 05-Streaming.py
- src: 06-Optimization.py
- src: 07-Architecture.py
- src: 08-Capstone-Project.py
- src: Includes/Classroom-Setup.py
dest: "$target_lang/$notebook_type/Includes/$basename.$target_extension"
master:
footer:
enabled: false
heading:
enabled: false
The ../common/cfg.yaml
file might contain common settings used by all
build files within a particular curriculum set. For instance:
# common/cfg.yaml
# Minimum version of bdc necessary to parse this file.
bdc_min_version: "1.30"
master_parse_min_version: "1.20"
top_dbc_folder_name: $course_id
student_dbc: Lessons.dbc
keep_lab_dirs: true
notebook_type_name:
answers: Solutions
instructor: ''
exercises: ''
src_base: notebooks
student_dir: ''
misc_files:
-
src: CHANGELOG.md
dest_is_dir: true
dest: .
notebook_defaults:
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
enable_templates: true
instructor_notes: "Instructor-Notes/${target_basename}-notes.md"
Two required settings control the minimum versions of both bdc
and
the master parser are required for a given build file. Both tools use
semantic versioning, so version numbers are of the
form major.minor.patch (e.g., 1.19.3). Since patch versions should
always be backward-compatible, bdc
only looks at the major and minor
numbers. (Thus, the values "1.19.3", "1.19" and "1.19.0" are the same: They
all set the minimum version to "1.19".)
NOTE: These values must be quoted, so they don't get interpreted as floating point numbers by the YAML parser!
-
bdc_min_version
sets the minimum version ofbdc
required to run the build. Using an older version ofbdc
will cause an automatic build failure. -
master_parse_min_version
sets the minimum version of the master parser required to run the build. Using an older version ofmaster_parse
will cause an automatic build failure.
The following example says that the build.yaml
requires at least version
1.21 of bdc
and version 1.14 of master_parse
:
bdc_min_version: "1.21"
master_parse_min_version: "1.14"
Certain build.yaml
parameters permit variable substitution, using the
Python template string
syntax. bdc
supports certain hard-coded variables, as noted in the various
sections, below.
You can also supply your own custom variables, which can be substituted in most places that support variables. Custom variables are also passed into the master parser, if master parsing is enabled.
The variable substitution syntax is powerful. See Variable Substitution for complete details.
To define your own variables, just include a variables
section in your
build.yaml
. Each field within that section defines a name/value pair,
available for substitution.
For example:
variables:
author: Databricks
revised: 2018-09-30
That section defines two substitutions: ${author}
, which will be replaced
with the string "Databricks", and ${revised}
, which will be replaced with
the string "2018-09-30".
The master parser allows Markdown cells to be processed as templates.
(See Notebook Markup for details on cell templates.)
Any variables you define in your variables
section will be passed to the
master parser and will be available for substitution in your Markdown
cells—provided you enable Markdown cell templates.
The course_info
section defines information about the course. It contains
the following fields:
-
name
: (REQUIRED) The name of the course (e.g., "Databricks-Delta"). The name should not have any white space, as it is used to construct file names. -
title
: (OPTIONAL) A human-readable title for the course, which can contain white space (unlikename
). Default: The value ofname
. -
version
: (REQUIRED) The (semantic) version number of the course (e.g., "1.0.1"). -
type
: (REQUIRED) The course type. Legal values are "ilt" (for instructor-led materials) and "self-paced". -
copyright_year
: (OPTIONAL) Copyright year of the course materials. Defaults to the current year. -
class_setup
: (OPTIONAL) Path to the class setup instructions (for site managers of a given training site) on how to prepare the classroom environment. For Databricks classes, a survey exists for this purpose. However, for partners, this document summarizes the minimum needs. The path relative to the directory containing build file. If defined, this file will be copied to the top of the destination directory or directories. Only meaningful for ILT classes. -
schedule
: (OPTIONAL) Path to a document that describes the recommended teaching schedule for the class. Only meaningful for ILT classes. -
prep
: (OPTIONAL) Path to a Markdown document that outlines any instructor preparation that must be done before teaching the class. The file, if specified, is copied toInstructorFiles/Preparation.md
under the target directory. An HTML version is also generated there. Only meaningful for ILT classes. -
deprecated
: (OPTIONAL) If present and true, this field marks the course as deprecated (i.e., no longer used). Attempts to build the course will fail. Default: false
Example sections:
course_info:
name: Databricks-Delta
title: Databricks Delta
version: 1.0.0
type: self-paced
course_info:
name: Spark-ILT
version: 2.1.2-SNAPSHOT
type: ilt
bdc
can copy documents into the build output directory, optionally
generating different kinds of output formats. This section describes
document-related configuration items and sections within the build.yaml
file.
bdc
can process Markdown files, producing both HTML and PDF output. It
can also generate PDF output from HTML files or HTML templates.
When generating HTML and PDF from Markdown, bdc
uses an internal HTML
stylesheet, by default. However, you can override that stylesheet with
the markdown
section:
markdown:
html_stylesheet: path/to/my/stylesheet.css
If you specify markdown.html_stylesheet
, the stylesheet you specify is
inserting, inline, in each HTML file that is generated from Markdown source.
Unless absolute, the path is assumed to be relative to the build file.
Several settings help define the layout of the final built course.
-
student_dbc
: (OPTIONAL) The name of the DBC that contains student notebooks. Defaults toLabs.dbc
. -
instructor_dbc
: (OPTIONAL) The name of the DBC that contains instructor notebooks. Defaults toInstructor-Labs.dbc
. Note that this DBC is only created if at least one instructor notebook is generated. See Notebooks for details. -
student_dir
: (OPTIONAL) The name of the folder, relative to the top of the output directory, in which to store student files such as the generated student DBC. If explicitly set to the empty string (''), the student DBC will be written to the top-level output directory. This value must not be the same asinstructor_dir
. Default:StudentFiles
. -
instructor_dir
: (OPTIONAL) The name of the folder, relative to the top of the output directory, in which to store instructor files such as the generated instructor DBC. If explicitly set to the empty string (''), the instructor DBC will be written to the top-level output directory. This value must not be the same asstudent_dir
. Default:StudentFiles
. -
keep_labs_dir
: (OPTIONAL) While generating output notebooks, the build tools stash them in directories within the output directory. For example, if the student DBC is calledLabs.dbc
, then the tools will stash the notebooks in aLabs
directory understudent_dir
. The DBC is then generated from that directory. Ifkeep_labs_dir
isfalse
, that directory is removed after the corresponding DBC is built. Ifkeep_labs_dir
istrue
, that directory is not removed (which can be useful for debugging).
The default values are generally useful for an ILT class, where you want separate instructor and student areas and DBCs. A typical self-paced class might use these values:
student_dir: '' # DBC at the top level
student_dbc: Lessons.dbc
instructor_dir and instructor_dbc are untouched, but the notebooks
are configured so that no instructor notebooks are generated. Thus, the
instructor DBC will never be written.
A DBC file is a special kind of zip file containing JSON-encoded notebooks.
By default, when bdc
generates the final DBC files, it places all notebooks
under a top-level directory named after the course. You can change that
strategy by setting the top_dbc_folder_name
variable.
The following variables can be substituted into this value:
Variable | Meaning |
---|---|
${course_name} |
the course name, from course_info.name
|
${course_version} |
the course version, from course_info.version
|
${course_id} |
convenience variable: same as ${course_name}-${course_version}
|
${profile} |
the name of the current build profile ("amazon" or "azure"), if any |
your variables | any custom variables you define |
Examples:
top_dbc_folder_name: Course-${course_name}-${course_version}
top_dbc_folder_name: Lessons
top_dbc_folder_name: ${course_name} # same as the default
While we don't currently teach from slides, if you have some slides that
accompany your course, you can include those slides via the slides
section.
slides
consists of a series of (src
, dest
pairs), one for each file to be
copied. The src
path is relative to the location of the build.yaml
file.
The dest
is relative to a Slides
directory beneath the instructor directory.
See [Output Generation][#output-generation] for details on the instructor
directory.
Another field, skip
, can be set to true
to cause the file to be skipped.
This is an alternative to commenting the section out.
Within the dest
field, the following variable substitutions
are available:
VARIABLE | DESCRIPTION |
---|---|
${basename} |
the base file name of the src , WITHOUT the extension |
${filename} |
the base file name of the src , WITH the extension |
${extension} |
the src file's extension |
For example:
slides:
-
src: Slides/Welcome.pptx
dest: Presentations/00-$filename
-
src: Slides/Architecture.pptx
dest: Presentations/01-$filename
skip: true
In this example, there are two slide decks, a "welcome" deck and an "architecture" deck.
The architecture deck is skipped, because skip
is set to true
.
The welcome deck is located at Slides/Welcome.pptx
below the directory
containing the build.yaml
. If instructor_dir
is set to its default file,
that file will be copied to
<build_output_dir>/InstructorFiles/Slides/Welcome.pptx
.
It's also possible to include data sets in the output directory. This section
is very similar to the slides
section. It consists of a series of (src
,
dest
pairs), one for each file to be copied. The src
path is relative to
the location of the build.yaml
file. The dest
is relative to a generated
Datasets
directory under the build output directory.
Another field, skip
, can be set to true
to cause the file to be skipped.
This is an alternative to commenting the section out.
Within the dest
field, the following variable substitutions
are available:
WARNING: The directory that contains each dataset file must also contain
a LICENSE.md
file that describes the license for the data and a README.md
file that briefly describes the data and where it came from. The build will
abort if those files are not present and non-empty.
VARIABLE | DESCRIPTION |
---|---|
${basename} |
the base file name of the src , without the extension |
${filename} |
the base file name of the src , with the extension |
${extension} |
the src file's extension |
For example:
datasets:
-
src: datasets/pets.csv
dest: pets/$basename
-
src: datasets/autos.csv
dest: autos/$basename
skip: true
In this example, there are two data sets, autos.csv
and pets.csv
,
each of which resides under separate directories within the datasets
directory
that's right beneath build.yaml
.
The autos.csv
dataset is ignored, because skip
is set to true
.
The pets.csv
dataset is copied to <build_output>/Datasets/pets/pets.csv
.
Its LICENSE.md
and README.md
files are copied to the same directory,
as are their (generated) HTML and PDF counterparts.
The misc_files
section provides the mechanism for copying any other kind of
non-notebook file into the build directory. (Notebooks are
handled specially.)
misc_files
consists of a list of source documents, along with instructions
on how to copy them. Each file section can have the following fields:
src
: (REQUIRED) The path to the source file, relative to the directory
containing build.yaml
.
dest
(REQUIRED) The destination path, relative to the top of the
build output directory (or the profile subdirectory, if
Build Profiles are enabled). A value of "." means
"top-level directory". This parameter can be a file or a directory. If the
destination does not have an extension, it is assumed to be a directory,
unless you set dest_is_dir
to false.
The following substitutions are permitted within dest
:
SUBSTITUTION | DESCRIPTION |
---|---|
${basename} |
the base file name of the source, without the extension |
${filename} |
the base file name of the source, with the extension |
${extension} |
the file extension |
your variables | Any variables defined in the variables section, without prefix. |
dest_is_dir
: (OPTIONAL) true
indicates that dest
is intended to
be a directory; false
indicates that it is a file. Defaults to false
.
You can't set this to true
if the destination has an extension.
template
: (OPTIONAL) true
indicates that the source file is actually
a Mustache template, which allows you to use variable substitution and
conditional text based on variables. For a brief overview of Mustache,
see Basic Mustache Syntax, below.
If the file is not a text file (as determined by its extension), then
template
cannot be set to true
.
The following variables are made available to templates:
SUBSTITUTION | DESCRIPTION |
---|---|
course_info.<var> |
Any variable from the course_info section. e.g., {{course_info.name}}
|
variables.<var> |
Any variable from the variables section. e.g., {{variables.myVar}}
|
amazon |
Set to "Amazon" (which also evaluates as true for conditional template logic) if the current build profile is "amazon". Set to '' (which also evaluates as false for conditional template logic) if the current build profile is not "amazon" or if build profiles are disabled. |
azure |
Set to "Azure (which also evaluates as true for conditional template logic) if the current build profile is "amazon". Set to '' (which also evaluates as false for conditional template logic) if the current build profile is not "azure" or if build profiles are disabled. |
In addition to template processing, bdc
performs other processing when
copying miscellaneous files.
-
If
src
is an HTML file anddest
is a directory, the HTML file is copied to the destination (after optionally being expanded from a template). Then, a PDF is generated from the HTML and placed in the samedest
directory. -
If
src
is a Markdown file (that is, it has extension.md
or.markdown
) anddest
is a directory, the Markdown file is copied to the destination (after optionally being expanded from a template). Then, an HTML version is generated and copied todest
. Finally, a PDF is generated from the HTML and copied todest
.
Build profiles allow you to build multiple versions of each course, possibly with conditional content that only appears in a specific profile.
Prior to bdc
version 1.30.0, only "amazon" and "azure" profiles were
supported, and you enabled those profiles by setting use_profiles
to true
in your build.yaml
.
As of bdc
1.30.0 and master_parse
1.20.0, the build tools support
arbitrary build profiles. If defined in build.yaml
, each build profile:
- triggers a separate build
- is built into its own output subdirectory
- enables build profile-specific notebook tags
- enables build profile-specific Mustache tags
NOTE: If you don't define any profiles, then your course is only built once, and profile-specific notebook and Mustache tags don't exist.
An example best describes this feature:
profiles:
- amazon: Amazon
- azure: Azure
- sklearn: sklearn
With that section in your build.yaml
, bdc
will build the course three
times, producing amazon
, azure
and sklearn
subdirectories in the build
output directory.
You can include or exclude cells, based on the active profile, with the
PROFILES
tag. For instance:
%md
-- PROFILES: amazon, azure
This Markdown cell will only be included in the output notebooks for the
"azure" and "amazon" profiles. It will be suppressed in the "sklearn" build.
In addition, Mustache will be defined for each profile, as shown below:
ACTIVE PROFILE | {{azure}} |
{{amazon}} |
{{sklearn}} |
---|---|---|---|
azure |
Substitutes as "Azure". Evaluates to True
|
Substitutes as "". Evaluates to False
|
Substitutes as "". Evaluates to False
|
amazon |
Substitutes as "". Evaluates to False
|
Substitutes as "Amazon". Evaluates to True
|
Substitutes as "". Evaluates to False
|
sklearn |
Substitutes as "". Evaluates to False
|
Substitutes as "". Evaluates to False
|
Substitutes as "sklearn". Evaluates to True
|
Backward Compatibility
For backward compatibility, build.yaml
still supports:
use_profiles: true
This setting is equivalent to:
profiles:
- amazon: Amazon
- azure: Azure
The AMAZON_ONLY
and AZURE_ONLY
cell tags are still supported, though you
should prefer the newer PROFILES
tag.
See also only_in_profile
in the Notebooks section.
Source notebooks listed in the build.yaml
are parsed, run through the master
parser, converted into multiple output notebooks, and, ultimately, gathered
into a single Databricks DBC file for easy import.
This section discusses the various notebook-related settings in build.yaml
.
Note that DBC file generation is discussed in Output Generation.
src_base
defines the root location of the notebooks. Use .
if the notebook
locations are relative to the directory containing build.yaml
. Otherwise,
specify the location as a relative path. Each notebook's src
attribute
will be appended to the value of src_base
to locate the notebook file.
Examples:
src_base: . # notebooks are under the directory containing the build file
src_base: ../../modules # notebooks are under the "modules" directory
The notebooks
section is a list of notebooks to be processed and included
in the course. Each notebook is parsed and stored in the output DBC file(s).
Optionally, source notebooks can be processed by the master parser,
producing multiple output notebooks.
The notebooks are assumed to be in source-export format.
WARNING: The notebooks should be encoded in ASCII or UTF-8. Other encodings (e.g., ISO-8859.1 or CP-1252) might cause the build to abort.
Each notebook in the notebooks
section can have the following fields.
REQUIRED: The path to the notebook, relative to src_base
.
REQUIRED: The destination path within the DBC file and within the student lab directory. (See Output Generation.) For notebooks not processed by the master parser, this destination is the path to which to copy the source notebook.
For notebooks that are to be run through the master parser (see below), the destination format depends on how many different output languages are being generated. If the master parser is generate output for just a single target language (such as Python), the destination should be a directory.
If the master parser is generating output for multiple target languages (e.g.,
Scala and Python), then the pattern must contain the ${target_lang}
substitution and should also contain the ${target_extension}
substitution,
to differentiate the destination.
In short:
- If you specify
${target_lang}
in the dest value, the target master parse language is substituted, for each language-generated notebook. - If you don't specify
${target_lang}
, and there are multiple languages selected in the "master" section, you'll get an error.
For example, here's a sample entry for a master-parsed notebook:
src: Introduction.py
dest: ${target_lang}/Introduction.${target_extension}
master:
enabled: true
scala: true
python: true
Here is one for a non-master parsed notebook (i.e., one that is just copied):
src: Introduction.py
dest: $filename
(If this seems confusing, just try different variations, set keep_labs_dir
to true
, and examine the output directory after running a build. The behavior
will become clear.)
Within the dest
field, the following substitutions are always honored:
VARIABLE | DESCRIPTION |
---|---|
${basename} |
the base file name of the src , without the extension |
${filename} |
the base file name of the src , with the extension |
${extension} |
the src file's extension |
In addition, if master parsing is enabled for the notebook, the following substitutions are also permitted:
VARIABLE | DESCRIPTION |
---|---|
${target_lang} |
the output notebook's language (e.g, "Scala", "Python", etc.) |
${target_extension} |
the output notebook's extension, which may differ from the source extension |
${notebook_type} |
the notebook type ("exercises", "answers", "instructor"). Also see notebook_type_name , below. |
OPTIONAL: If set to true
, the notebook is skipped. Defaults to false
.
Setting skip
to true
is a convenient way to ignore a notebook without
commenting it out. (You can also comment it out, if you prefer.)
OPTIONAL: true
if this notebook should be uploaded and downloaded with the
bdc
upload (--upload
) or download (--download
) commands are specified.
Defaults to true
.
Setting this value to false
is useful (and, often, necessary) if you're
double-processing a notebook for some reason.
If include_in_build
is true
(the default), the notebook is included in
the output build. If include_in_build
is false
, the notebook is omitted
from the build.
But, if upload_download
is still true
(which is the default), the notebook
will be uploaded and downloaded, using the dest
value to determine what to
call it on the Databricks instance.
This feature allows you to include notebooks in your build.yaml
and have
them uploaded to and downloaded from your Databricks workspace, but have them
excluded from the build output.
Mark the notebook as either 'amazon' or 'azure', indicating that it is
Amazon-only or Azure-only. If this value is set, the master parser must be
enabled and use_profiles
must be true
. (See Build Profiles.)
The master
subsection, if present and enabled within a notebook, marks the
notebook as a master notebook to be run through the master parser. This
section contains the configuration parameters for the master parser, telling it
how to process the notebook.
If the master
section is missing or disabled, the source notebook is just
copied to the output directory.
The following parameters are supported.
NOTE: There's no option to enable or disable generation of exercises notebooks. Those notebooks are always generated, if master parsing is enabled.
OPTIONAL: true
to enable master parsing, false
to disable it. Default:
false
.
The easiest way to enable master parsing with all the defaults is:
master:
enabled: true
OPTIONAL: true
to enabled generation of the answers notebooks, false
to disable generation of answers notebooks. Default: true
.
OPTIONAL: true
to enabled generation of the instructor notebooks, false
to disable generation of instructor notebooks. Default: true
.
OPTIONAL: true
to enable generation of Scala notebooks, false
to disable generation of Scala notebooks. Default: true
.
OPTIONAL: true
to enable generation of Python notebooks, false
to disable generation of Scala notebooks. Default: true
.
OPTIONAL: true
to enable generation of R notebooks, false
to disable generation of R notebooks. Default: false
.
OPTIONAL: true
to enable generation of SQL notebooks, false
to disable generation of SQL notebooks. Default: false
.
OPTIONAL. Available starting in bdc
1.30.0.
If set, this value specifies a path, relative to the build output directory (or the build output profile subdirectory), for the consolidated instructor notes files for each notebook. If set, the master parser will consolidate all instructor note cells in each notebook into a single Markdown file for the notebook. The final per-notebook Markdown files will also be converted to HTML and to PDF.
This field supports limited variable substitution. In addition to any
notebook variables, you can substitute ${target_basename}
, which is the
base file name of the destination notebook. For instance:
notebook_defaults:
master:
enabled: true
python: true
scala: true
enable_templates: true
instructor_notes: "Instructor-Notes/${target_basename}-notes.md"
notebooks:
- src: Foo.py
dest: "$target_lang/$notebook_type/01-Getting-Started.$target_extension"
In that example, for notebook Foo.py
:
- the instructor notes will be written to
<output-dir>/Instructor-Notes/01-Getting-Started-notes.md
, - the corresponding HTML will be written to
<output-dir>/Instructor-Notes/01-Getting-Started-notes.html
. - the corresponding PDF will be written to
<output-dir>/Instructor-Notes/01-Getting-Started-notes.pdf
.
OPTIONAL: If true
, then
[Markdown cells will be processed as templates][cell templates]. Otherwise,
they won't. Default: false
.
OPTIONAL: The encoding to use when reading the master notebook. Default: UTF-8.
OPTIONAL: The encoding to use when writing the output notebooks. Default: UTF-8.
heading
is an OPTIONAL subsection that defines whether to generate a notebook
heading cell in each output notebook. Heading supports two fields:
FIELD | MEANING |
---|---|
path |
Path to a notebook heading file. The path is relative to the build file directory. The file must be HTML or Markdown and is inserted into a %md-sandbox cell at the top of each notebook. If not specified, or if set to "DEFAULT", an internal "Databricks Academy" default is used. |
enabled |
Whether or not to insert the heading. true by default. One use case for false is to override notebook defaults (see below) for a notebook. |
Example:
src: Foo.scala
dest: ${target_lang}/Foo.${target_extension}
master:
enabled: true
heading:
path: misc_files/heading.md
footer
is an OPTIONAL subsection that defines whether to generate a notebook
footer cell in each output notebook. Heading supports two fields:
FIELD | MEANING |
---|---|
path |
Path to a notebook footer file. The path is relative to the build file directory. The file must be HTML or Markdown and is inserted into a %md-sandbox cell at the bottom of each notebook. If not specified, or if set to "DEFAULT", an internal default (a copyright cell) is used. |
enabled |
Whether or not to insert the heading. true by default. One use case for false is to override notebook defaults (see below) for a notebook. |
Example:
src: Foo.scala
dest: ${target_lang}/Foo.${target_extension}
master:
enabled: true
footer:
path: misc_files/footer.md
This section defines the value of the built-in ${notebook_type}
variable.
As the master parser processes a notebook, it can generate three basic types
of notebooks: exercises, answers and instructor notebooks. In some places,
notably dest
values, you can use${notebook_type}
to substitute the
current value. For example, consider this notebook
definition:
notebooks:
-
src: 01-Intro.py
dest: $target_lang/01-Intro-$notebook_type.$target_extension
master:
enabled: true
With that definition, the master parser will create six notebooks:
- A Scala exercises notebook
- A Python exercises notebook
- A Scala answers notebook
- A Python answers notebook
- A Scala instructor notebook
- A Python instructor notebook
As it generates each of those notebooks, it will expand the dest
pattern
accordingly. It will generate the following output notebooks:
OUTPUT NOTEBOOK | GENERATED PARTIAL PATH |
---|---|
Scala exercises notebook |
Scala/01-Intro-exercises.scala (in student directory) |
Python exercises notebook |
Python/01-Intro-exercises.py (in student directory) |
Scala answers notebook |
Scala/01-Intro-exercises.scala (in student directory) |
Python answers notebook |
Python/01-Intro-exercises.py (in student directory) |
Scala instructor notebook |
Scala/01-Intro.scala (in instructor directory) |
Python instructor notebook |
Python/01-Intro.py (in instructor directory) |
From that example, we can see that the default values for ${notebook_type}
are:
NOTEBOOK TYPE | GENERATED PARTIAL PATH |
---|---|
exercises | "exercises" |
answers | "answers" |
instructor | "" |
The notebook_type_name
section lets you change one or all of those
values. For instance, suppose we wanted a layout where the exercises notebooks
are at the top level of the labs directory and the answers notebooks
are below them, in a "Solutions" subdirectory. But we still want the
instructor notebooks at the top-level of the instructor labs directory.
We can achieve that by changing our notebook destination and by adjusting
the notebook type names, as shown:
notebook_type_name:
answers: Solutions
instructor: ''
exercises: ''
notebooks:
-
src: 01-Intro.py
dest: $target_lang/$notebook_type/01-Intro.$target_extension
master:
enabled: true
With this change, we'll get the following layout for our generated notebooks:
OUTPUT NOTEBOOK | GENERATED PARTIAL PATH |
---|---|
Scala exercises notebook |
Scala/01-Intro.scala (in student directory) |
Python exercises notebook |
Python/01-Intro.py (in student directory) |
Scala answers notebook |
Scala/Solutions/01-Intro.scala (in student directory) |
Python answers notebook |
Python/Solutions/01-Intro.py (in student directory) |
Scala instructor notebook |
Scala/01-Intro.scala (in instructor directory) |
Python instructor notebook |
Python/01-Intro.py (in instructor directory) |
Here's an example of a notebooks section:
notebooks:
- src: notebooks/Delta/01-Introduction.py
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
- src: notebooks/02-Architecture.py
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
- src: notebooks/03-Tuning.py
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
- src: notebooks/04-Debugging.py
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
- src: notebooks/05-Capstone-Project.py
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
There's a lot of repetition in that configuration. In the next section, we'll see how to factor that out.
If you find yourself repeating a lot of configuration data in your notebooks
section, you can pull the repeated elements out and put them in a special
notebook_defaults
section. notebook_defaults
defines default values
for any notebook. You can override those values, if you want, on a
per-notebook basis. notebook_defaults
can contain default values for
the master
section, the heading
section, the footer
section, and the
dest
value.
This time, let's start with an example. Let's see how a notebook_defaults
section can simplify the
complete notebooks section example,
above.
notebook_defaults:
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: false
notebooks:
- src: notebooks/Delta/01-Introduction.py
- src: notebooks/02-Architecture.py
- src: notebooks/03-Tuning.py
- src: notebooks/04-Debugging.py
- src: notebooks/05-Capstone-Project.py
Notice how the dest
and master
items are now specified once, in the
notebook_defaults
section, vastly simplifying the list of notebooks.
You can also choose to override the settings, on a per notebook basis.
For example, suppose we want to enable instructor for everything but the
last notebook (the "capstone project"). We can do that by adjusting the
notebook_defaults
to enable instructor notebooks, then overriding that
default for just the last notebook:
notebook_defaults:
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: true
notebooks:
- src: notebooks/Delta/01-Introduction.py
- src: notebooks/02-Architecture.py
- src: notebooks/03-Tuning.py
- src: notebooks/04-Debugging.py
- src: notebooks/05-Capstone-Project.py
master:
instructor: false
notebook_defaults
can also contain a variables
section. For instance:
notebook_defaults:
dest: '$target_lang/$notebook_type/$notebook_type/$basename.$target_extension'
master:
enabled: true
scala: true
python: true
answers: true
instructor: true
variables:
suffix: ${notebook_type[0]}
Variables defined in notebook_defaults
are evaluated when each output
notebook is generated and are also passed to the master parser if templates
are enabled. Variables defined here override any variables defined in the
global variables
section. However, they cannot
override built-in variables (such as ${notebook_type}
or ${course_id}
).
Any attempt to do so is just ignored.
For some courses (e.g., self-paced), it's useful to be able to generate an
output bundle once the build is complete. bdc
will do that for you, if you
include a bundle
section in build.yaml
.
A bundle is just a zip file containing other files. Currently, a bundle cannot contain a directory; it can only contain files. (That restriction may be lifted in the future, if the need arises.)
The bundle
section consists of a series of (src
, dest
) pairs. The src
is a file from the build output directory that is to be copied into the
zip file. The dest
is the name (and, if desired, path) within the zip
file.
If build profiles are being used, bdc
will generate one
bundle for each profile—that is, one bundle for "amazon" and another bundle
for "azure". If build profiles are not being used, then bdc
will generate
just one bundle.
Formally, a bundle has the following fields:
zipfile
: (OPTIONAL) The name of the zip file to be generated. This is
not a path; it's a simple file name. It is generated in the top build
directory (if build profiles aren't being used) or in each profile directory
(if build profiles are used).
The following variables are available for substitution within this value:
VARIABLE | DESCRIPTION |
---|---|
${course_name} |
the name of the course, from course_info.name
|
${course_version} |
the course version, from course_info.version
|
${profile} |
the profile ("amazon" or "azure"), or "" if profiles aren't enabled |
If zipfile
is not defined, bdc
will use ${course_name}-${course_version}.zip
.
files
: The list of (src
/dest
) pairs to be zipped up. If empty, then
no bundle is generated. src
is relative to the top-level build directory (if
build profiles aren't being used) or to the profile directory (if profiles are
being used).
Within dest
, the following substitutions are permitted:
VARIABLE | DESCRIPTION |
---|---|
${basename} |
the base file name of the src , without the extension |
${filename} |
the base file name of the src , with the extension |
${extension} |
the src file's extension |
${profile} |
the profile ("amazon" or "azure"), or "" if profiles aren't enabled |
your variables | variables from the variables section. |
An example will help clarify this section:
bundle:
zipfile: course.zip
files:
-
src: 00_README.pdf
dest: $filename
-
src: Labs.dbc
dest: Lessons.dbc
In this example, the file 00_README.pdf
will be copied into the zip file
(using the same name), and the Labs.dbc
file will be copied into the zip
file (but as Lessons.dbc
). Instead of the default zip file name, bdc
will
use course.zip
.
Many (but not all) items in a build.yaml
file support variable substitution.
This section discusses that feature.
Variables currently come from several places:
-
There are variables that are built into
bdc
, such as${notebook_type}
,${basename}
, and others. -
You can define build-wide variables of your own in the "variables" section in
build.yaml
. (These variables cannot override built-in variables.) For example, if you define the followingvariables
section, you can substitute${foo}
wherever custom variables are supported:
variables:
foo: This string will replace ${foo}
-
You can define per-notebook variables in a "variables" section in each notebook. These variables can also override build-wide globals, on a per-notebook basis, though they cannot override
bdc
built-ins. -
You can define variables for all notebooks in the
notebook_defaults
section.
See the sample build.yaml for full details.
The variable substitution syntax is Unix shell-like:
-
$var
substitutes the value of a variable called "var" -
${var}
substitute the value of a variable called "var"
The second form is useful when you need to ensure that a variable's name doesn't get mashed together with a subsequent non-white space string, e.g.:
-
${var}foo
substitutes the value of "var" preceding the string "foo" -
$varfoo
attempts to substitute the value of "varfoo"
To escape a $
, use $$
or \$
.
To escape a backslash, use \\
.
Legal variable names consist of alphanumeric and underscore characters only.
Variables can be subscripted and sliced, Python-style, as long as they use the
brace (${var}
) syntax.
Examples:
${foo[0]}
${foo[-1]}
${foo[2:3]}
${foo[:]}
${foo[:-1]}
${foo[1:]}
Subscripts are interpreted as in Python code, except that the "stride"
capability isn't supported. (That is, you cannot use ${foo[0:-1:2]
to slice through a value with index jumps of 2.)
One difference: If the final subscript is too large, it's sized down. For
instance, given the variable foo
set to "ABCDEF"
, the substitution
${foo[100]}
yields "F"
, and the substitution ${foo[1:10000]}
yields
"BCDEF"
. As a special case, subscripting an empty variable always
yields an empty string, regardless of the subscript.
The variable syntax supports a C-like "ternary IF" statement. The general form is:
${variable == "SOMESTRING" ? "TRUESTRING" : "FALSESTRING"}
${variable != "SOMESTRING" ? "TRUESTRING" : "FALSESTRING"}
Rules:
- The braces are not optional.
- The strings (
SOMESTRING
,TRUESTRING
andFALSESTRING
) must be surrounded by double quotes. Single quotes are not supported. - Simple variable substitutions (
$var
,${var}
,${var[0]}
, etc.) are permitted within the quoted strings, but the quotes are still required. Ternary IFs and inline editing are not supported within a ternary IF. - The white space is optional.
- When using a ternary IF substitution, your must surround the entire string in single quotes. The string has to be quoted to prevent the YAML parser from getting confused by the embedded ":" character.
- To use a literal double quote within one of the ternary expressions,
escape it with
\"
.
Examples:
Substitute the string "FOO" if variable "foo" equals "foo". Otherwise, substitute the string "BAR":
${foo == "foo" ? "FOO" : "BAR"}
Substitute the string "-solution" if variable "notebook_type" is "answers". Otherwise, substitute nothing:
${notebook_type=="answers"?"-solution":""}
Variables within the ternary expressions:
${foo == "$bar" ? "It matches $$bar." : "It's $foo, not $bar"}
^ ^ ^ ^ ^ ^
Note that the double quotes are REQUIRED
${x == "abc${foo}def" ? "YES" : "NO."}
Double quote (") as part of a value being tested:
${foo == "\"" ? "QUOTE" : "NOT QUOTE"}
bdc
supports basic sed-like editing on a variable's value, using a syntax
that's vaguely reminiscent (but somewhat more readable) than the Bash
variable-editing syntax.
bdc
supports a simple inline editing capability in variable substitution,
reminiscent of the bash
syntax (but a little easier to read). The basic
syntax is:
${var/regex/replacement/flags}
${var|regex|replacement|flags}
Note that only two delimiters are supported, "|" and "/", and they must match.
By default, the first instance of the regular expression in the variable's
value is replaced with the replacement. (You can specify a global replacement
with a flag. See flags
, below.)
regex
regex
is a standard Python regular expression.
Within the pattern, you can escape the delimiter with a backslash. For instance:
${foo/abc\/def/abc.def/}
However, it's usually easier and more readable just to use the alternate delimiter:
${foo|abc/def|abc.def|}
replacement
replacement
is the replacement string. Within this string:
- You can escape the delimiter with a leading backslash (though, as with
regex
, it's usually more readable to use the alternate delimiter). - You can refer to regular expression groups as "$1", "$2", etc.
- You can escape a literal dollar sign with a backslash.
- Simple variable substitutions (
$var
,${var}
,${var[0]}
, etc.) are permitted the replacement. Ternary IFs and nested inline editing are not supported.
flags
Two optional flags are supported:
-
i
- do case-blind matching -
g
- substitute all matches, not just the first one
To specify both, just use gi
or ig
.
Examples
Assume the following variables:
foo: Hello
filename: 01-Why-Spark.py
basename: 01-Why-Spark
-
${filename/\d/X/}
yields "X1-Why-Spark.py" -
${filename/\d/X/g}
yields "XX-Why-Spark.py" -
${basename/(\d+)(-.*)$/$1s$2/
yields "01s-Why-Spark" -
${filename/\.py//}
yields "01-Why-Spark"
Mustache is a very simple template language. For full details, see the Mustache manual page. For our purposes, two most useful constructs are conditional content and variable substitution.
Here's an example of conditional content:
{{#amazon}}
Please run this course in Databricks, using the Amazon AWS cloud.
{{/amazon}}
{{#azure}}
Please run this course using Azure Databricks.
{{/azure}}
If the variable "amazon" has a non-empty value (or is true
), then the
first string will be included; otherwise, it'll be suppressed. Likewise, if
the variable "azure" has a non-empty value (or is true
), then the
second string will be included; otherwise, it'll be suppressed.
This is Mustache's form of an if statement. There is no else statement.
There's a kind of if not, however: Simply replace the #
with a ^
.
{{^amazon}}
Rendered if amazon is not defined.
{{/amazon}}
This construct also works inline:
Mount your {{#amazon}}S3 bucket{{/amazon}}{{#azure}}blob store{{/azure}}
to DBFS.
Variable substitution is quite simple: Just enclose the variable's name in
{{
and }}
. For example:
This is {{course_info.title}}, version {{course_info.version}}
If the course title is "A Very Cool Course", and the course version is 1.0.0, the above string will render as:
This is A Very Cool Course, version 1.0.0
NOTICE
- This software is copyright © 2017-2021 Databricks, Inc., and is released under the Apache License, version 2.0. See LICENSE.txt in the main repository for details.
- Databricks cannot support this software for you. We use it internally, and we have released it as open source, for use by those who are interested in building similar kinds of Databricks notebook-based curriculum. But this software does not constitute an official Databricks product, and it is subject to change without notice.