Skip to content
This repository was archived by the owner on Feb 11, 2026. It is now read-only.

Commit c4ae6b1

Browse files
authored
feat: move validate notebooks action (#15)
Move notebook validation script to `ci-actions` Looking for advice on whether to include a workflow yaml, and what it should look like Approved-by: ktdreyer Approved-by: courtneypacheco
2 parents 825dab5 + 5df1bee commit c4ae6b1

File tree

4 files changed

+135
-0
lines changed

4 files changed

+135
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Below is a list of the in-house GitHub actions stored in this repository:
1919
| [detect-exposed-workflow-secrets](./actions/detect-exposed-workflow-secrets/detect-exposed-workflow-secrets.md) | Used to detect when a contributor's PR would expose a GitHub secret through one or more workflow files that auto-trigger on PRs, and aborts that contributor's PR build before it can start. | <ul><li>Prevent accidental secrets exposure through GitHub workflow files that auto-trigger on PR builds.</li></ul> |
2020
| [free-disk-space](./actions/free-disk-space/free-disk-space.md) | Used to reclaim disk space on either a GitHub or EC2 runner. | <ul><li>If a CI job tends to fail due to "insufficient disk space"</li><li>If you want to reduce cloud costs by reclaiming disk space instead of increasing your writeable cloud storage to compensate for a bloated EC2 image</li></ul> |
2121
| [launch-ec2-runner-with-fallback](./actions/launch-ec2-runner-with-fallback/launch-ec2-runner-with-fallback.md) | Used launch an EC2 instance in AWS, either as a spot instance or a dedicated instance. If your preferred availability zone lacks availability for your instance type, "backup" availability zones will be tried. | <ul><li>Insufficient capacity in AWS (i.e., AWS lacks availablility for your desired EC2 instance type in your preferred availability zone)</li><li>Cost savings (i.e., You want to try launching your EC2 runner as a spot instance first)</li></ul> |
22+
| [validate-notebooks](./actions/launch-ec2-runner-with-fallback/launch-ec2-runner-with-fallback.md) | Used to validate `.ipynb` files | <ul><li>I maintain a collection of `.ipynb` files and run ci jobs to test them. I would like to quickly verify that the files are formatted correctly before spinning up more complex or expensive CI jobs to test those notebooks.</li></ul>
2223

2324
## ❓ How to Use One or More In-House GitHub Actions
2425

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
name: 'Validate Notebooks'
2+
description: 'Validates Jupyter Notebook Files'
3+
author: "InstructLab"
4+
5+
inputs:
6+
path:
7+
description: a path to a file or directory containing jupyter notebook files; accepts multiple arguments
8+
required: true
9+
type: string
10+
11+
runs:
12+
using: "composite"
13+
steps:
14+
- name: Install Dependencies
15+
shell: bash
16+
run: pip install nbformat
17+
- name: Validate Notebooks
18+
shell: bash
19+
run: python ${{ github.action_path }}/validate.py $notebook_paths ${{ inputs.path }}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Validate Notebooks
2+
3+
## Overview
4+
5+
Validate Notebooks is a simple action that reads a file with the `.ipynb` extension, and verifies that body of that file is a valid jupyter notebook.
6+
7+
## When to Use it?
8+
9+
This tool is best used as a preliminary step for any CI workflows involving jupyter notebooks. This script is best used as a first test for any workflows testing
10+
jupyter notebooks. If any of the notebooks passed to it are not valid, it will fail at the end of the job, and report out all of the errors it encountered. This will prevent jobs that may be more complicated from getting started, then failing unexpectedly.
11+
12+
## Usage
13+
14+
This is a reusable workflow, and can be referenced and used in any github actions workflow. Here is an example of how to import this into a workflow that needs to test a notebook file named `my_notebook.ipynb` and all the notebooks contained in the directory `./directory_containing_notebooks/`:
15+
16+
```yaml
17+
jobs:
18+
example-job:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- name: Checkout "validate-notebooks" in-house CI action
22+
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
23+
with:
24+
repository: instructlab/ci-actions
25+
path: ci-actions
26+
sparse-checkout: |
27+
actions/validate-notebooks
28+
- name: Validate Jupyter Notebooks
29+
uses: ./ci-actions/actions/validate-notebooks
30+
with:
31+
notebook_paths: "notebooks/"
32+
```
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
3+
"""
4+
This script validates a Jupyter notebook file.
5+
Usage: python validate_notebook.py <notebook.ipynb>
6+
"""
7+
8+
# Standard
9+
from pathlib import Path
10+
import argparse
11+
import os
12+
import sys
13+
14+
# Third Party
15+
import nbformat # type: ignore[import-not-found]
16+
17+
18+
def validate_notebook(file_path) -> bool:
19+
try:
20+
with open(file_path, "r", encoding="utf-8") as f:
21+
nb = nbformat.read(f, as_version=4)
22+
nbformat.validate(nb)
23+
print(f"{file_path} is a valid Jupyter notebook.")
24+
return True
25+
except nbformat.ValidationError as e:
26+
print(f"{file_path} is not a valid Jupyter notebook. Validation error: {e}")
27+
return False
28+
29+
30+
def list_valid_files(paths: list) -> list:
31+
"""
32+
list valid files finds and returns a list of .ipynb files in the given list of paths
33+
"""
34+
35+
all_files = []
36+
for path in paths:
37+
if not os.path.exists(path):
38+
print(f"invalid path {path}: path does not exist")
39+
continue
40+
if os.path.isfile(path):
41+
if ".ipynb" not in path:
42+
print(
43+
f"invalid path {path}: files must be ipython notebooks"
44+
) # fatal error: files must be .ipynb
45+
sys.exit(1)
46+
all_files.append(path)
47+
elif os.path.isdir(path):
48+
search_dir = Path(path)
49+
discovered_notebook_files = search_dir.rglob("*.ipynb")
50+
for file in discovered_notebook_files:
51+
all_files.append(file)
52+
return all_files
53+
54+
55+
if __name__ == "__main__":
56+
invalid_notebook_found = False
57+
parser = argparse.ArgumentParser(
58+
description="validates jupyter notebook files by checking the underlying JSON. This will not run the notebooks."
59+
)
60+
parser.add_argument(
61+
"path",
62+
metavar="path",
63+
type=str,
64+
nargs="+",
65+
help="a path to an .ipynb file, or a directory containing .ipynb files",
66+
)
67+
args = parser.parse_args()
68+
if len(args.path) < 1:
69+
print("must provide at least one path")
70+
sys.exit(1)
71+
72+
user_provided_paths = list_valid_files(args.path)
73+
if len(user_provided_paths) < 1:
74+
print("no valid file paths to jupyter notebooks provided")
75+
sys.exit(1)
76+
77+
for user_path in user_provided_paths:
78+
ok = validate_notebook(user_path)
79+
if not ok:
80+
invalid_notebook_found = True
81+
82+
if invalid_notebook_found:
83+
sys.exit(1) # indicate to ci environment that something failed

0 commit comments

Comments
 (0)