Skip to content

feat(kernel_checkpoint): POC for kernel-state snapshot/restore#105

Draft
jamesbhobbs wants to merge 1 commit into
mainfrom
jh/poc-kernel-checkpoint-restore
Draft

feat(kernel_checkpoint): POC for kernel-state snapshot/restore#105
jamesbhobbs wants to merge 1 commit into
mainfrom
jh/poc-kernel-checkpoint-restore

Conversation

@jamesbhobbs
Copy link
Copy Markdown
Contributor

@jamesbhobbs jamesbhobbs commented May 28, 2026

Why

The dominant cold-start cost for projects with a heavy init notebook is re-running init on every fresh container boot. For data-loading / client-setup init, this is tens of seconds to minutes per cold start.

This POC ships the data plane for snapshotting the kernel's post-init globals to a pluggable store and restoring them on the next cold boot, skipping the init re-run entirely. Architecture is production-shaped so the same module ships unchanged into prod once we add an S3 backend.

What's in the box

deepnote_toolkit/kernel_checkpoint/
  __init__.py    — public surface + invariant docstring
  store.py       — SnapshotStore protocol + LocalDiskSnapshotStore impl.
                   S3-backed prod impl will be a sibling later.
  key.py         — compute_checkpoint_key(project_id, init_source_hash,
                   environment_id). Human-readable so debugging stale-
                   restore incidents is one `cat` away.
  checkpoint.py  — save_checkpoint / try_restore_checkpoint. Best-effort:
                   unpicklable values are logged-and-skipped; modules and
                   dunders / ipython artifacts are skipped by rule.
  __main__.py    — CLI for end-to-end testing without touching prod code.

Correctness invariant

A restore is correct iff the restored namespace produces the same downstream execution behaviour as freshly re-running init in the current environment.

The checkpoint key includes init source hash + environment id so source/env changes always invalidate the cache.

Verified end-to-end

$ python -m deepnote_toolkit.kernel_checkpoint --root /tmp/poc-snap \
    save "proj:alpha:init:abc:env:py311" /tmp/poc-init.py
[checkpoint] saved 4 globals (505 bytes)

$ python -m deepnote_toolkit.kernel_checkpoint --root /tmp/poc-snap \
    restore "proj:alpha:init:abc:env:py311"
[checkpoint] restored 4 globals (505 bytes)

$ python -m deepnote_toolkit.kernel_checkpoint --root /tmp/poc-snap \
    restore "proj:alpha:init:DIFFERENT:env:py311"
[checkpoint] no snapshot found

Test plan

  • 21/21 unit tests pass: round-trip primitives / containers / user fns / user class instances / pandas DataFrames; unpicklable values skipped-not-fatal; dunders/modules/ipython artifacts skipped by rule; key stability + invalidation; LocalDiskSnapshotStore atomic overwrite + safe key encoding
  • CLI end-to-end demonstrated (above)
  • Productionisation PR: S3SnapshotStore + hook into runtime/executor.py init lifecycle + webapp signaling of init_source_hash + env_id + per-workspace feature flag

Out of scope (productionisation TODO)

  • S3SnapshotStore — swap LocalDiskSnapshotStore in prod
  • Wiring into runtime/executor.py init lifecycle (write after init success, try-restore on cold boot)
  • Webapp signaling of (init_source_hash, environment_id)
  • Filesystem state in the cache key — init reading a file that has since changed would currently produce a stale restore. Mitigation: include a /work directory hash in the checkpoint key
  • Size cap + S3 multipart upload story for large dataframes in init
  • Cross-Python-version restore — dill bytes from 3.11 may not load on 3.12; env id is expected to discriminate Python versions

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Kernel checkpoint functionality now captures and restores kernel state after initialization, reducing cold-start time.
    • Command-line interface added for managing kernel snapshots with configurable storage location.
    • Snapshot system automatically excludes unpicklable objects and system artifacts while preserving user code and variables.
  • Tests

    • Comprehensive test suite validates checkpoint save/restore operations, key generation, and storage reliability.

Review Change Stack

Today the dominant cold-start cost for projects with a heavy init notebook
is re-running init on every fresh container boot. This POC ships the data
plane for snapshotting the kernel's post-init globals to a pluggable store
and restoring them on the next cold boot, skipping the init re-run.

# What's in the box

deepnote_toolkit/kernel_checkpoint/
  __init__.py    — public surface + invariant docstring
  store.py       — SnapshotStore protocol + LocalDiskSnapshotStore impl.
                   S3-backed prod impl will be a sibling later.
  key.py         — compute_checkpoint_key(project_id, init_source_hash,
                   environment_id). Human-readable so debugging stale-
                   restore incidents is one `cat` away.
  checkpoint.py  — save_checkpoint / try_restore_checkpoint. Best-effort:
                   unpicklable values are logged-and-skipped; modules and
                   dunders / ipython artifacts are skipped by rule.
  __main__.py    — CLI for end-to-end testing without touching prod code.

# Correctness invariant

A restore is correct iff the restored namespace produces the same downstream
execution behaviour as freshly re-running init in the current environment.

The checkpoint key includes init source hash + environment id so source/env
changes always invalidate the cache.

# Verified end-to-end

  $ python -m deepnote_toolkit.kernel_checkpoint --root /tmp/poc-snap \
      save "proj:alpha:init:abc:env:py311" /tmp/poc-init.py
  [checkpoint] saved 4 globals (505 bytes)

  $ python -m deepnote_toolkit.kernel_checkpoint --root /tmp/poc-snap \
      restore "proj:alpha:init:abc:env:py311"
  [checkpoint] restored 4 globals (505 bytes)

  $ python -m deepnote_toolkit.kernel_checkpoint --root /tmp/poc-snap \
      restore "proj:alpha:init:DIFFERENT:env:py311"
  [checkpoint] no snapshot found

# Out of scope for this PR (productionisation TODO)

  - S3SnapshotStore — swap LocalDiskSnapshotStore in prod
  - Wiring into runtime/executor.py init lifecycle
  - Webapp signaling of (init_source_hash, environment_id)
  - Filesystem state in the cache key (init reading a file that has
    since changed would currently produce a stale restore)
  - Size cap and S3 multipart upload story for large dataframes

21/21 unit tests pass including pandas DataFrame round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR adds a kernel checkpoint system to snapshot and restore Python globals post-kernel initialization. The feature reduces cold-start latency by persisting the initialized namespace. Storage uses a protocol abstraction with local-disk implementation via atomic temp-file writes. Checkpoints are keyed by project ID, initialization source hash, and environment ID. Save logic applies skip rules (modules, dunders, IPython artifacts) and records unpicklable values without aborting. Restore loads snapshots via dill. A CLI provides save/restore commands. Comprehensive tests validate roundtrip behavior, skip rules, key composition, and storage semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning Kernel checkpoint feature added without updating README.md or docs/. External repo docs cannot be verified. Add user documentation to README.md or docs/user/kernel_checkpoint.md and update deepnote/deepnote-internal roadmap.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: a POC for kernel-state snapshot/restore functionality. It is concise, specific, and directly reflects the main intent of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

📦 Python package built successfully!

  • Version: 2.3.0.dev4+deae9df
  • Wheel: deepnote_toolkit-2.3.0.dev4+deae9df-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.3.0.dev4%2Bdeae9df/deepnote_toolkit-2.3.0.dev4%2Bdeae9df-py3-none-any.whl"

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 60.81081% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.75%. Comparing base (46807ba) to head (a62f68d).
⚠️ Report is 12 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
deepnote_toolkit/kernel_checkpoint/__main__.py 0.00% 53 Missing ⚠️
deepnote_toolkit/kernel_checkpoint/store.py 82.75% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #105      +/-   ##
==========================================
- Coverage   74.32%   73.75%   -0.57%     
==========================================
  Files          94      100       +6     
  Lines        5534     5826     +292     
  Branches      824      860      +36     
==========================================
+ Hits         4113     4297     +184     
- Misses       1155     1253      +98     
- Partials      266      276      +10     
Flag Coverage Δ
combined 73.75% <60.81%> (-0.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepnote_toolkit/kernel_checkpoint/__main__.py`:
- Line 35: The CLI currently calls the private method store._path_for(...) in
deepnote_toolkit/kernel_checkpoint/__main__.py to build the message; instead
expose a public API (either make store.write return the final path or add a
public method like store.path_for/get_path) and update the CLI to use that
public result instead of _path_for; ensure the change returns the same path used
when writing so the print line (which references report.saved_names and
args.key) can display the path without touching private store internals.
- Line 65: The function signature for main uses the union syntax "list[str] |
None" which is inconsistent with the module's preferred use of typing.Optional;
update the main function signature to use Optional[list[str]] instead, add "from
typing import Optional" to the imports if missing, and ensure any other type
hints in this module follow the same Optional[...] convention for consistency
with store.py and checkpoint.py.

In `@deepnote_toolkit/kernel_checkpoint/checkpoint.py`:
- Around line 123-127: The code currently calls dill.loads(payload) on untrusted
data read via store.read(key); instead add integrity/authentication verification
before deserialization: expect an authenticated envelope (e.g., AEAD ciphertext
or a signed payload with signature field) and verify it using your project
crypto utilities (or add HMAC/verify using a configured secret/public key)
before calling dill.loads; if verification fails, return None or raise and do
not call dill.loads. Update checkpoint.restore logic around payload, dill.loads,
and RestoreReport to parse/verify the envelope, decrypt/unwrap or reject invalid
payloads, and only then set restored and construct RestoreReport.

In `@deepnote_toolkit/kernel_checkpoint/store.py`:
- Around line 43-46: The _path_for method currently maps keys to filenames with
simple replace, which is non-injective and allows path-traversal; replace it by
producing a deterministic, filesystem-safe filename derived from a strong hash
of the key (e.g., sha256 hex) and optionally a short sanitized/truncated
human-readable suffix, ensure the filename contains no path separators or "..",
join with self._root, then resolve and assert the resolved path is inside
self._root (use Path.resolve() and Path.is_relative_to or compare parents)
before returning to prevent escapes; update references to _path_for accordingly.

In `@tests/unit/test_kernel_checkpoint.py`:
- Around line 184-189: The tests call compute_checkpoint_key with empty inputs
but currently use broad pytest.raises(ValueError) which can catch unrelated
errors; update each assertion to use pytest.raises(ValueError, match="...") with
a specific expected message fragment (e.g., "project_id", "init_source_hash",
and "environment_id") so each raise asserts the correct validation path in
compute_checkpoint_key.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ce517a1-1400-413b-a2f7-499e959fbbdf

📥 Commits

Reviewing files that changed from the base of the PR and between 98e2371 and a62f68d.

📒 Files selected for processing (6)
  • deepnote_toolkit/kernel_checkpoint/__init__.py
  • deepnote_toolkit/kernel_checkpoint/__main__.py
  • deepnote_toolkit/kernel_checkpoint/checkpoint.py
  • deepnote_toolkit/kernel_checkpoint/key.py
  • deepnote_toolkit/kernel_checkpoint/store.py
  • tests/unit/test_kernel_checkpoint.py

exec(compile(source, args.python_file, "exec"), namespace)
store = LocalDiskSnapshotStore(args.root)
report = save_checkpoint(namespace, store, args.key)
print(f"[checkpoint] saved {len(report.saved_names)} globals to {store._path_for(args.key)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Avoid calling a private store method from CLI.

Line 35 reaches into store._path_for(...). Expose a public helper (or return path info from write) so CLI does not depend on private internals.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/kernel_checkpoint/__main__.py` at line 35, The CLI currently
calls the private method store._path_for(...) in
deepnote_toolkit/kernel_checkpoint/__main__.py to build the message; instead
expose a public API (either make store.write return the final path or add a
public method like store.path_for/get_path) and update the CLI to use that
public result instead of _path_for; ensure the change returns the same path used
when writing so the print line (which references report.saved_names and
args.key) can display the path without touching private store internals.

return 0


def main(argv: list[str] | None = None) -> int:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo_root="."
file="deepnote_toolkit/kernel_checkpoint/__main__.py"

echo "=== File exists? ==="
ls -l "$file" || true

echo "=== Showing around line 55-75 with numbers ==="
nl -ba "$file" | sed -n '50,85p'

Repository: deepnote/deepnote-toolkit

Length of output: 295


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="deepnote_toolkit/kernel_checkpoint/__main__.py"
echo "=== File exists? ==="
ls -l "$file" || true

echo "=== Showing around line 55-75 with numbers ==="
nl -ba "$file" | sed -n '50,85p'

echo "=== Search for 'Optional' usage in that file ==="
rg -n "Optional\[" "$file" || true

echo "=== Search for other PEP604 unions ' | None' in that file ==="
rg -n "\|\s*None" "$file" || true

Repository: deepnote/deepnote-toolkit

Length of output: 294


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="deepnote_toolkit/kernel_checkpoint/__main__.py"
echo "=== File exists? ==="
ls -l "$file" || true

echo "=== Showing around line 55-75 with numbers ==="
nl -ba "$file" | sed -n '50,85p'

echo "=== Imports in that file (top 40 lines) ==="
nl -ba "$file" | sed -n '1,45p'

echo "=== Search for 'Optional[' usage in that file ==="
rg -n "Optional\[" "$file" || true

echo "=== Search for PEP604 unions '| None' in that file ==="
rg -n "\|\s*None" "$file" || true

Repository: deepnote/deepnote-toolkit

Length of output: 294


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="deepnote_toolkit/kernel_checkpoint/__main__.py"
echo "=== File exists? ==="
ls -l "$file" || true

echo "=== Showing around line 55-75 with numbers ==="
nl -ba "$file" | sed -n '55,75p'

echo "=== Imports in that file (top 60 lines) ==="
nl -ba "$file" | sed -n '1,60p'

echo "=== Search for typing.Optional usage in repo (small scope) ==="
rg -n "from typing import Optional|Optional\[" deepnote_toolkit | head -n 20

echo "=== Search for PEP604 '| None' usage in same module area (deepnote_toolkit/kernel_checkpoint) ==="
rg -n "\|\s*None" deepnote_toolkit/kernel_checkpoint | head -n 20

Repository: deepnote/deepnote-toolkit

Length of output: 294


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="deepnote_toolkit/kernel_checkpoint/__main__.py"
echo "=== Line 1-120 with numbers (awk) ==="
awk 'NR>=1 && NR<=120 {printf "%d\t%s\n", NR, $0}' "$file"

echo "=== Exact main() signature line containing 'def main' ==="
rg -n "def main\(" "$file" || true

echo "=== Search for 'Optional[' usage in that file ==="
rg -n "Optional\[" "$file" || true

echo "=== Search for PEP604 '| None' usage in that file ==="
rg -n "\|\s*None" "$file" || true

echo "=== Imports in that file ==="
rg -n "^(from typing import|import typing|from __future__ import|from typing_extensions import)" "$file" || true

Repository: deepnote/deepnote-toolkit

Length of output: 3797


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "Optional\[" deepnote_toolkit | head -n 20
echo "----"
rg -n "\|\s*None" deepnote_toolkit | head -n 20
echo "----"
echo "Counts:"
python3 - <<'PY'
import re, subprocess, sys
def count(pattern):
    p=subprocess.run(["bash","-lc",f"rg -n {pattern!r} deepnote_toolkit | wc -l"],capture_output=True,text=True)
    return int(p.stdout.strip() or "0")
# basic counts
opt=count("Optional\\[")
union=count("\\|\\s*None")
print("Optional[ count:",opt)
print("| None count:",union)
PY

Repository: deepnote/deepnote-toolkit

Length of output: 2441


Use Optional[list[str]] for the nullable argv param (or align the whole module)
deepnote_toolkit/kernel_checkpoint/__main__.py:65 uses list[str] | None; this kernel_checkpoint package already uses PEP 604 (... | None) in store.py/checkpoint.py, so either switch to Optional[...] consistently there (incl. this function) or allow | None as the repo convention.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/kernel_checkpoint/__main__.py` at line 65, The function
signature for main uses the union syntax "list[str] | None" which is
inconsistent with the module's preferred use of typing.Optional; update the main
function signature to use Optional[list[str]] instead, add "from typing import
Optional" to the imports if missing, and ensure any other type hints in this
module follow the same Optional[...] convention for consistency with store.py
and checkpoint.py.

Comment on lines +123 to +127
payload = store.read(key)
if payload is None:
return None
restored: dict[str, Any] = dill.loads(payload)
return restored, RestoreReport(restored_names=list(restored.keys()), bytes_read=len(payload))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether checkpoint payload integrity/authenticity is implemented anywhere.
rg -n -C3 'dill\.loads|dill\.dumps|hmac|signature|sign|verify|sha256|blake2|cryptography|fernet' deepnote_toolkit

Repository: deepnote/deepnote-toolkit

Length of output: 15277


Authenticate checkpoint payloads before dill.loadsdeepnote_toolkit/kernel_checkpoint/checkpoint.py (lines 123-127) restores state via dill.loads(payload) directly from store.read(key). If the stored snapshot can be modified, this enables arbitrary code execution. Add integrity/authentication (e.g., signed payloads or an AEAD envelope) and fail closed before deserialization.

🧰 Tools
🪛 Ruff (0.15.14)

[error] 126-126: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/kernel_checkpoint/checkpoint.py` around lines 123 - 127, The
code currently calls dill.loads(payload) on untrusted data read via
store.read(key); instead add integrity/authentication verification before
deserialization: expect an authenticated envelope (e.g., AEAD ciphertext or a
signed payload with signature field) and verify it using your project crypto
utilities (or add HMAC/verify using a configured secret/public key) before
calling dill.loads; if verification fails, return None or raise and do not call
dill.loads. Update checkpoint.restore logic around payload, dill.loads, and
RestoreReport to parse/verify the envelope, decrypt/unwrap or reject invalid
payloads, and only then set restored and construct RestoreReport.

Comment on lines +43 to +46
def _path_for(self, key: str) -> Path:
# Forward slashes and colons would create unwanted directories; flatten.
safe = key.replace("/", "__").replace(":", "_")
return self._root / safe
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Unsafe key-to-path mapping can collide and traverse.

Line 45’s replacement scheme is non-injective ("a/b" and "a__b" collide) and still allows ".." as a path component. That can overwrite the wrong snapshot or write outside root.

Suggested fix
+import hashlib
+
     def _path_for(self, key: str) -> Path:
-        # Forward slashes and colons would create unwanted directories; flatten.
-        safe = key.replace("/", "__").replace(":", "_")
-        return self._root / safe
+        digest = hashlib.sha256(key.encode("utf-8")).hexdigest()
+        return self._root / f"{digest}.snapshot"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _path_for(self, key: str) -> Path:
# Forward slashes and colons would create unwanted directories; flatten.
safe = key.replace("/", "__").replace(":", "_")
return self._root / safe
def _path_for(self, key: str) -> Path:
digest = hashlib.sha256(key.encode("utf-8")).hexdigest()
return self._root / f"{digest}.snapshot"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/kernel_checkpoint/store.py` around lines 43 - 46, The
_path_for method currently maps keys to filenames with simple replace, which is
non-injective and allows path-traversal; replace it by producing a
deterministic, filesystem-safe filename derived from a strong hash of the key
(e.g., sha256 hex) and optionally a short sanitized/truncated human-readable
suffix, ensure the filename contains no path separators or "..", join with
self._root, then resolve and assert the resolved path is inside self._root (use
Path.resolve() and Path.is_relative_to or compare parents) before returning to
prevent escapes; update references to _path_for accordingly.

Comment on lines +184 to +189
with pytest.raises(ValueError):
compute_checkpoint_key(project_id="", init_source_hash="h", environment_id="e")
with pytest.raises(ValueError):
compute_checkpoint_key(project_id="p", init_source_hash="", environment_id="e")
with pytest.raises(ValueError):
compute_checkpoint_key(project_id="p", init_source_hash="h", environment_id="")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten ValueError assertions with match=.

These checks are broad; they can pass for unrelated ValueErrors. Assert the expected message fragment.

Suggested fix
-    with pytest.raises(ValueError):
+    with pytest.raises(ValueError, match="project_id must be a non-empty string"):
         compute_checkpoint_key(project_id="", init_source_hash="h", environment_id="e")
-    with pytest.raises(ValueError):
+    with pytest.raises(ValueError, match="init_source_hash must be a non-empty string"):
         compute_checkpoint_key(project_id="p", init_source_hash="", environment_id="e")
-    with pytest.raises(ValueError):
+    with pytest.raises(ValueError, match="environment_id must be a non-empty string"):
         compute_checkpoint_key(project_id="p", init_source_hash="h", environment_id="")
🧰 Tools
🪛 Ruff (0.15.14)

[warning] 184-184: pytest.raises(ValueError) is too broad, set the match parameter or use a more specific exception

(PT011)


[warning] 186-186: pytest.raises(ValueError) is too broad, set the match parameter or use a more specific exception

(PT011)


[warning] 188-188: pytest.raises(ValueError) is too broad, set the match parameter or use a more specific exception

(PT011)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_kernel_checkpoint.py` around lines 184 - 189, The tests call
compute_checkpoint_key with empty inputs but currently use broad
pytest.raises(ValueError) which can catch unrelated errors; update each
assertion to use pytest.raises(ValueError, match="...") with a specific expected
message fragment (e.g., "project_id", "init_source_hash", and "environment_id")
so each raise asserts the correct validation path in compute_checkpoint_key.

@deepnote-bot
Copy link
Copy Markdown

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-105
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-05-28 13:17:14 (UTC)
📜 Deployed commit 592a7c1ee899a5b87ed77f9af97e3cd6210fcdf3
🛠️ Toolkit version deae9df

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants