Skip to content

fix(errors): restore CUDA exception hierarchy to avoid slow string compilation#796

Merged
gmarkall merged 5 commits intoNVIDIA:mainfrom
cpcloud:fix-slow-string-compile-pr
Feb 27, 2026
Merged

fix(errors): restore CUDA exception hierarchy to avoid slow string compilation#796
gmarkall merged 5 commits intoNVIDIA:mainfrom
cpcloud:fix-slow-string-compile-pr

Conversation

@cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Feb 18, 2026

Summary

  • Rebuild CUDA error classes in numba_cuda/numba/cuda/core/errors.py as CUDA-local subclasses of corresponding numba.core.errors classes.
  • Apply a ruff-mandated style fix in numba_cuda/numba/cuda/core/errors.py (_mod.NumbaError = NumbaError) with no behavior change.
  • Restore narrow CUDA-vs-core exception checks used by compile-time control flow while preserving compatibility for handlers that catch core error types.
  • Add regression coverage in numba_cuda/numba/cuda/tests/cudapy/test_errors.py that asserts hierarchy invariants across NumbaError descendants.
  • Sync pixi.lock package entries so local numba-cuda lock metadata matches the current 0.27.0 version.

Problem statement

The redirect in numba_cuda/numba/cuda/core/errors.py made CUDA error names direct aliases of numba.core.errors. That broadened exception checks in CUDA typing/overload resolution, so some compile paths (notably string-heavy cases) stopped short-circuiting and became much slower.

Alternatives considered

The options were evaluated on behavior restoration, compatibility, implementation effort, and maintenance.

Option A: Revert redirect and restore independent CUDA error classes

  • Mechanism: remove aliasing and reinstate pre-redirect CUDA-local class definitions.
  • Behavior impact: restores narrow isinstance behavior.
  • Compatibility impact: may drift from upstream numba.core.errors shape.
  • Maintenance profile: medium-high ongoing sync burden.

Option B: Keep aliasing and patch compiler callsites

  • Mechanism: keep shared error identity, patch callsites that rely on narrow CUDA-vs-core checks.
  • Behavior impact: restores targeted paths where patched.
  • Compatibility impact: preserves current alias model.
  • Maintenance profile: medium risk of future regressions as new callsites appear.

Option C: Manually mirror CUDA hierarchy as subclasses of core classes

  • Mechanism: explicitly define CUDA classes and inheritance mirroring numba.core.errors.
  • Behavior impact: restores narrow checks and subclass compatibility.
  • Compatibility impact: explicit and predictable.
  • Maintenance profile: medium-high upkeep as upstream classes evolve.

Option D: Dynamically mirror hierarchy at import time

  • Mechanism: generate CUDA-local subclasses for each core NumbaError descendant while preserving parent-child structure.
  • Behavior impact: restores narrow checks and keeps CUDA exceptions as subclasses of matching core exceptions.
  • Compatibility impact: preserves expected catch behavior for core exception handlers.
  • Maintenance profile: lower ongoing sync burden than manual hierarchy mirroring.

Comparison at a glance

Option Restores narrow checks Preserves core-compat catches Initial effort Ongoing upkeep
A: Independent CUDA classes Yes Partial Low Medium-High
B: Callsite patches Partial Yes Medium Medium
C: Manual mirrored hierarchy Yes Yes Medium-High Medium-High
D: Dynamic mirrored hierarchy Yes Yes Medium Low-Medium

Selected approach in this PR

This PR implements Option D and adds regression tests to lock in the hierarchy invariants:

  • issubclass(cuda_error, core_error) remains true for matching classes.
  • issubclass(core_error, numba.cuda.core.errors.NumbaError) remains false to preserve narrow compile-time gates.

Closes #755

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Automatic reviews are disabled for this repository.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 18, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rparolin rparolin added this to the numba-cuda backlog milestone Feb 18, 2026
@cpcloud
Copy link
Contributor Author

cpcloud commented Feb 18, 2026

/ok to test

1 similar comment
@cpcloud
Copy link
Contributor Author

cpcloud commented Feb 18, 2026

/ok to test

cpcloud and others added 4 commits February 20, 2026 14:38
…compilation (NVIDIA#755)

The error module redirect (491f552) replaced the CUDA exception
subclass hierarchy with identity aliases to numba.core.errors classes.
This broadened isinstance checks throughout the CUDA compiler, causing
it to catch upstream exceptions it previously ignored and try far more
compilation passes -- resulting in orders-of-magnitude slower compile
times for types with many overload candidates (e.g. strings).

Use dynamic diamond inheritance to create local exception subclasses:
for every upstream NumbaError descendant, a local class is created
that inherits from both the local parent and the upstream class.  This
restores the narrow isinstance semantics the compiler relies on while
preserving the user-facing catch semantics that 491f552 introduced.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add a focused hierarchy invariants test to ensure CUDA error classes remain
local subclasses of core errors while preserving narrow core-vs-CUDA isinstance
behavior relied on by compile-time control flow.

Co-authored-by: Cursor <cursoragent@cursor.com>
Update local numba-cuda package entries in pixi.lock to the current 0.27.0
version so the lockfile metadata stays consistent with this branch.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use direct attribute assignment for `NumbaError` on the redirected module to satisfy ruff while preserving existing error hierarchy behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gmarkall
Copy link
Contributor

/ok to test

@gmarkall gmarkall added 4 - Waiting on CI Waiting for a CI run to finish successfully 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on CI Waiting for a CI run to finish successfully labels Feb 27, 2026
@gmarkall gmarkall merged commit 5df7dcb into NVIDIA:main Feb 27, 2026
206 of 208 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Slow compile times for string arguments in device functions (0.22 vs 0.24)

3 participants