Skip to content

3. Notebook Markup

Brian Clapper edited this page Oct 28, 2019 · 2 revisions

The master parser component parses Databricks source notebooks and, based on specialized markup embedded within the notebooks, produces (possibly multiple) output notebooks.

The master parser is automatically invoked by bdc, whenever a notebook in the build file has a master section.

Table of Contents

Notebook Processing

The tool looks for various labels, as well as language-specific tokens, within notebook cells.

Language tokens

By default, the master parse tool processes %scala, %python, %r and %sql cells specially. How it handles those cells is best described by example.

Suppose you've run the tool on a Python notebook (i.e., a file ending in .py), and the notebook also contains some %scala and %r cells.

  • If you've specified --scala (on the command line) or passed scala=True (to master_parse.process_notebooks()), then the tool will create Scala notebooks that contain all non-code cells (Markdown cells, %fs and %sh cells, etc.) in the original, as well as any %scala cells. All other language code cells will be stripped from the Scala notebooks.
  • If you've specified --rproject (on the command line) or passed r=True (to master_parse.process_notebooks()), then the tool will create R notebooks that contain all non-code cells (Markdown cells, %fs and %sh cells, etc.) in the original, as well as any %r cells. All other language code cells will be stripped from the R notebooks.
  • If you've specified --python (on the command line) or passed python=True (to master_parse.process_notebooks()), then the tool will create Python notebooks that contain all non-code cells (Markdown cells, %fs and %sh cells, etc.) in the original, as well as any Python cells. Since the file ends in .py, Python cells are assumed to be cells with explicit %python magic or any non-decorated (i.e., normal) code cells.

You can modify this behavior somewhat, using the labels below.

Master parse labels

Master parse labels are cells that are marked with special tokens that only the master parse tool recognizes. Some labels make sense only in code cells. Others can be used in code cells, Markdown cells, etc.

All labels must be preceded by a comment sequence. For instance:

# TODO
// TODO
-- TODO

Labels must appear on a line by themselves. Thus, use:

%md
// SCALA_ONLY

not

%md // SCALA_ONLY

Unlabeled

Cells not marked with any label are handled specially, depending on the cell type:

  • %md, %md-sandbox: Markdown cells appear in all output notebooks, unless suppressed, for example, with SCALA_ONLY, PYTHON_ONLY, INSTRUCTOR_NOTE, etc.

  • %fs and %sh cells appear in all output notebooks, unless explicitly suppressed.

  • Code cells only appear in the output notebook for their language, unless marked with ALL_NOTEBOOKS. Thus, a Scala cell only shows up in Scala notebooks, unless marked with ALL_NOTEBOOKS.

Examples

In a Scala code cell:

// ANSWER
// Scala answer goes here

In a markdown cell in a Python notebook:

%md
-- SCALA_ONLY
This Markdown cell is in a Python notebook, but it only appears in Scala
notebooks generated by the master parse tool.

Valid labels

The valid labels are:

IPYTHON_ONLY

This cell type is deprecated and will be removed in a future release of this tool. Use of it will generate warnings.

Cells which need to be in IPython (or Jupyter) notebooks only. If IPython notebooks aren't being generated, these cells are stripped out.

DATABRICKS_ONLY

This cell type is deprecated and will be removed in a future release of this tool. Use of it will generate warnings.

Cells which need to be in Databricks notebooks only.

SCALA_ONLY, PYTHON_ONLY, SQL_ONLY, R_ONLY

Cells marked with this show up only when generating notebooks for lang. These are for special cells (like Markdown cells, %fs cells, %sh cells) that you want to include on a language-dependent basis. For example, if a Markdown cell is different for Scala vs. Python, then you can create two %md cells, with one marked PYTHON_ONLY and the other marked SCALA_ONLY.

AMAZON_ONLY, AZURE_ONLY

Cells marked with AMAZON_ONLY only show up when building for target profile amazon. Cells marked with AZURE_ONLY only show up when building for target profile azure.

See the -tp command line option (in Miscellaneous options, above) or the bdc setting only_in_profile (in the bdc Notebooks section).

TODO

Cells show up only in exercise notebooks. These cells are usually exercises the user needs to complete.

As a special case, if the entire TODO cell is comment out, the master parser will strip the first level of comments. This allows for runnable TODO cells in source notebooks. Thus, the following three TODO cells are functionally equivalent in the output notebooks:

Not runnable in source notebook:

# TODO
x = FILL_THIS_IN

Runnable in source notebook:

# TODO
x = FILL_THIS_IN
# TODO
 x = FILL_THIS_IN

All three cells will render as follows in the Python answers output notebook:

# TODO
x = FILL_THIS_IN

NOTES:

  1. When you create a runnable TODO cell, you can use at most one blank character after the leading comment. (The blank is optional.) The master parser will remove the leading comment and, optionally, one subsequent blank from every commented line except for the line with the "TODO" marker.

  2. Do not precede TODO with multiple comment characters, even in a runnable TODO cell.. It won't work. That is, use // TODO or # TODO, not // // TODO or # # TODO. The latter won't be recognized as a proper TODO cell.

ANSWER

Cells show up only in instructor and answer notebooks.

TEST

These cells identify tests and usually follow an exercise cell. Test cells provide a means for a student to test the solution to an exercise. You can include an annotation after the word TEST. For example:

# TEST - Please run this cell to test your solution.

If you don't supply an annotation, the tool will add one. So, this line:

// TEST

will be emitted, in the generated notebooks, as:

// TEST - Run this cell to test your solution.

PRIVATE_TEST

Cells show up in instructor/answer notebooks.

VIDEO

Valid only in Markdown cells, this command is replaced with HTML for a large video button. When clicked, the button launches a new tab to the specified URL. The command takes the form VIDEO url [title]. url is the link to the video. The title (optional) is the video's title which, if present, will appear in the button. If no title is supplied, the button will not contain a title.

INSTRUCTOR_NOTE

INSTRUCTOR_ONLY and INSTRUCTOR_NOTES are both aliases for this tag.

Valid only in Markdown cells, this command causes the cell to be copied into the instructor notebook (if instructor notebooks are being generated) and omitted from the exercises and answers notebooks. An "Instructor Note" header will automatically be added to the cell.

In addition, if consolidated instructor notes are enabled for the notebook, cells marked with -- INSTRUCTOR_NOTE are consolidated and copied into a single Markdown document associated with the notebook.

SOURCE_ONLY

Valid in any cell, this tag marks a cell as a source-only cell. Source-only cells are never copied to output notebooks. Source-only cells are useful for many things, such as cells with credentials that are only to be used during curriculum development.

ILT_ONLY

An ILT_ONLY cell is only copied to output notebooks if the course type is "ilt". See the -ct (--content-type) command line parameter.

SELF_PACED_ONLY

An SELF_PACED_ONLY cell is only copied to output notebooks if the course type is "self-paced". See the -ct (--content-type) command line parameter.

ALL_NOTEBOOKS

The cell should be copied into all generated notebooks, regardless of language. Consider the following code in a Scala notebook:

%python
 ALL_NOTEBOOKS
x = 10

If you run the master parse tool to create Scala and Python notebooks, with instructor and student notebooks, that cell will appear in the generated Scala notebooks (instructor and answers) as well in the generated Python notebooks (instructor and answers).

INLINE

This cell type is deprecated and will be removed in a future release of this tool. Use of it will generate warnings.

Can be used for multilanguage notebooks to force another language to be inserted. The behavior is a little counterintuitive. Here's an example.

You're processing a notebook called foo.scala, so the base language is Scala. The notebook has these cells somewhere inside:

%python
# INLINE
x = 10
// INLINE
val y = 100

The first cell is a Python cell that would normally be suppressed in the output Scala output; it would either be written to the output Python notebook or suppressed entirely (if Python output was disabled).

However, because of the // INLINE, the cell is written to the output Scala notebook, instead, and suppressed in the output Python notebook.

Meanwhile, the opposite happens with the second cell. Because the second cell is Scala, but is marked as // INLINE, it is only written to non-Scala output notebooks.

NEW_PART

Start a new part of the lab. A lab can be divided into multiple parts with each part starting with a cell labeled NEW_PART. Every time the tool encounters a NEW_PART label, it creates a new notebook that starts with a cell that runs the previous part notebook (via %run), which enables students who are lagging behind to catch up.

Master parse inline tokens

The master parser also supports special inline tokens in Markdown cells. These tokens are replaced with images and, sometimes, markup. The four currently supported tokens are:

  • :HINT: A hint for the student.
  • :CAUTION: A caution or warning
  • :BESTPRACTICE: Indicates a best practice
  • :SIDENOTE: Something of note that’s not necessarily 100% pertinent to the rest of the cell.

Here's an example cell containing each token:

%md
We're talking about life here, people. This is some important stuff. Pay attention.

:HINT: Don't worry too much.

:CAUTION: Stress'll kill ya, man.

:BESTPRACTICE: Eat right, and get plenty of rest.

:SIDENOTE: No one gets out alive.

Currently, these tokens render as follows, in a %md-sandbox cell:

Cells as templates

The master parser supports treating Markdown cells (%md and %md-sandbox cells) as templates. This feature is disabled by default, but it can be enabled:

  • on a per-notebook basis in build.yaml, by setting the enable_templates field in the master section;
  • via the --templates command line option, if you're calling the master parser from the command line; or
  • via a parameter setting to the API, if you're calling the master parser programmatically.

When templates are enabled, Markdown cells are treated as Mustache templates. Its use, in notebook cells, allows you to:

  • do conditional substitution. For instance, insert this sentence if building for Azure, but use this other sentence if building for Amazon.
  • do token substitution. For instance, substitute the current value of this parameter here.

See below for a brief introduction to Mustache syntax.

Variables you can test or substitute

The master parser defines the following variables automatically:

  • amazon: Set to "Amazon" (which also evaluates as true in a template), if building for Amazon. Otherwise, set to an empty string (which also evaluates as false in a template).
  • azure: Set to "Azure" (which also evaluates as true in a template), if building for Azure. Otherwise, set to an empty string (which also evaluates as false in a template).
  • copyright_year: The value of the copyright year parameter.
  • notebook_language: The programming language of the notebook being generated (e.g., "Scala", "Python", "R", "SQL".)
  • scala: true if the output notebook is Scala, false otherwise.
  • python: true if the output notebook is Python, false otherwise.
  • r: true if the output notebook is R, false otherwise.
  • sql: true if the output notebook is SQL, false otherwise.
  • self_paced: true if the build is a self-paced build; false if it is an ILT build.
  • ilt: true if the build is an ILT build; false if it is a self-paced build.

In addition, you can substitute any variables defined in the bdc build file's variables section.

If calling the master parser from the command line, there's a --variable parameter that allows you to pass additional variables.

Built-in conditional logic

The Mustache templating also provides some other convenient expansions, each of which is described here.

Incrementally Revealable Hints

The parser supports a special nested block, in Markdown cells only, for revealable hints. The {{#HINTS}} construct introduces a block of hints (and is closed by {{/HINTS}}); such a construct contains one or more revealable hints and an optional answer.

This construct is best described by example. Consider the following Markdown cell:

%md

This is a pithy description of an exercise you are to perform, below.

{{#HINTS}}

{{#HINT}}Revealable hint 1.{{/HINT}}

{{#HINT}}  

Revealable hint 2. Note that the source for this one
is multiple lines _and_ contains some **Markdown** to be
rendered.

{{/HINT}}

{{#ANSWER}}

Still no luck? Here's your answer:

```
df = spark.read.option("inferSchema", "true").option("header", "true").csv("dbfs:/tmp/foo.csv")
df.limit(10).show()
```

{{/ANSWER}}

{{/HINTS}}

When run through the master parser, the above will render a cell that initially looks like this:

After the first button click, the cell will look like this:

After the second button click, the cell will look like this:

After the final button click, the cell will look like this:

More formally:

A hints block:

  • must contain at least one hint block. A hint is Markdown or HTML in between a starting {{#HINT}} and an ending {{/HINT}}.

  • may contain multiple {{#HINT}} blocks.

  • may contain an {{#ANSWER}} block.

{{#HINTS}}, {{#HINT}} and {{#ANSWER}} blocks may contain leading and trailing blank lines, to aid source readability; those lines are stripped on output.

Basic Mustache Syntax

Mustache is a very simple template language. For full details, see the Mustache manual page. For our purposes, two most useful constructs are conditional content and variable substitution.

Here's an example of conditional content:

{{#amazon}}
Rendered if amazon is defined.
{{/amazon}}

If the variable "amazon" has a non-empty value (or is true), then the string "Rendered if amazon is defined" is included in the cell. Otherwise, the entire construct is omitted.

This is Mustache's form of an if statement. There is no else statement. There's a kind of if not, however: Simply replace the # with a ^.

{{^amazon}}
Rendered if amazon is not defined.
{{/amazon}}

This construct also works inline:

Mount your {{#amazon}}S3 bucket{{/amazon}}{{#azure}}blob store{{/azure}}
to DBFS.

Variable substitution is quite simple: Just enclose the variable's name in {{ and }}. For example:

This is a {{notebook_language}} notebook.

If notebook_language is set to "Scala", that line will render as:

This is a Scala notebook.

Example

For a more complete example, consider this Markdown cell:

%md

In this {{notebook_language}} notebook,
you can access your data by mounting your
{{#amazon}}
S3 bucket
{{/amazon}}
{{#azure}}
Azure blob store
{{/azure}}
to DBFS.

When generated with an Amazon profile, in a Scala output notebook, this cell would become:

%md

In this Scala notebook,
you can access your data by mounting your
S3 bucket
to DBFS.
Clone this wiki locally