Releases: NVIDIA/numba-cuda
Releases · NVIDIA/numba-cuda
v0.30.0
What's Changed
- Test cuDF third party test
test_groupby_apply_return_reindexed_seriesby @mroeschke in #823 - Move cuda-pathfinder dependency to core by @ZzEeKkAa in #835
- Fix cache invalidation logic. by @tpn in #800
- Implement launch config infrastructure. by @tpn in #804
- Use cuda-pathfinder to locate nvdisasm by @Jlisowskyy in #842
- Fix CABI calling convention: skip env global and escape mangled names by @isVoid in #844
- Bump Version to 0.30.0 by @isVoid in #848
New Contributors
- @mroeschke made their first contribution in #823
Full Changelog: v0.29.0...v0.30.0
v0.29.0
What's Changed
- Extend dbg.value coverage to loadvar for scalar kernel parameters by @jiel-nv in #813
- Fix FP8 uint64 cast flake on Windows by @cpcloud in #829
- Use dbg.declare for scalar kernel parameters by @cpcloud in #828
- Fix mixed-IR liveness for inline overload DCE by @cpcloud in #795
- Use
cuda-pythonfornvvmbindings by @brandon-b-miller in #818 - fix(ci): cudaRoundMode typing failure in FP8 test by @kaeun97 in #834
- Support cuda_bindings FastEnum by @mdboom in #837
- Support cuda.core.GraphBuilder as a kernel-launch stream by @Andy-Jost in #836
- fix: normalize numpy integer types to python int to prevent overflow errors by @kaeun97 in #774
- Bump Version to 0.29.0 by @isVoid in #838
New Contributors
- @Andy-Jost made their first contribution in #836
Full Changelog: v0.28.2...v0.29.0
v0.28.2
What's Changed
- build(deps): bump the actions-monthly group with 2 updates by @dependabot[bot] in #815
- Swallow DLPack conversion error by @leofang in #825
- Fix empty DW_AT_location for reused loop variables by @jiel-nv in #811
- Bump version to 0.28.2 by @brandon-b-miller in #827
Full Changelog: v0.28.1...v0.28.2
v0.28.1
v0.28.0
What's Changed
- Fix CI checks job to also fail on dependency failures by @kkraus14 in #777
- Remove
enumsand unused ctypes code that required it by @brandon-b-miller in #775 - Clear
rtsysduring context reset by @brandon-b-miller in #783 - Fix pnputil to only restart NVIDIA display adapters by @kkraus14 in #779
- Bump
cudfversion to26.02in thirdparty tests by @brandon-b-miller in #785 - Remove unused cudf patch by @gmarkall in #786
- Fix np.dtype overload signature drift by @cpcloud in #797
- Use only public
.handleto accesscuda-coreobjects by @brandon-b-miller in #794 - Remove most of
drvapi.pyin favor of directcuda-pythonusage by @brandon-b-miller in #784 - Find CUDA headers via pathfinder by @brandon-b-miller in #771
- Fix incorrect DWARF type encodings for i8 and discriminator types by @jiel-nv in #806
- Fix int64 elements DWARF encoding in UniTuple types by @jiel-nv in #807
- Unpin nvjitlink from ctk version by @ZzEeKkAa in #809
- Disable legilize return type for inline always by @ZzEeKkAa in #812
- Fix mixed C ABI / Numba ABI internal-call lowering (#781, #789) by @isVoid in #782
- Aligned NumPy dtypes caching by @Jlisowskyy in #792
- fix(errors): restore CUDA exception hierarchy to avoid slow string compilation by @cpcloud in #796
- Revert exception hierarchy unification (#634) and its fix (#796) by @cpcloud in #816
- Bump version to 0.28.0 by @gmarkall in #817
New Contributors
- @Jlisowskyy made their first contribution in #792
Full Changelog: v0.27.0...v0.28.0
v0.27.0
What's Changed
- remove super args by @cpcloud in #763
- test(refactor): clean up
run_in_subprocessby @cpcloud in #762 - Disable automatic review trigger for Greptile by @gmarkall in #743
- Enable apt proxy caching; skip hosted Windows builds by @kkraus14 in #766
- build(deps): bump actions/setup-python from 6.1.0 to 6.2.0 in the actions-monthly group across 1 directory by @dependabot[bot] in #768
- Add
cuda-coretooldesttests by @brandon-b-miller in #769 - Generate line info for PHI exporters in terminator block by @jiel-nv in #756
- Move
CallConvfromCUDAContexttoFunctionDescriptorby @isVoid in #717 - feat: Add documentation for debugging Numba CUDA programs with CUDA GDB and VSCode by @mmason-nvidia in #665
- Remove unused
rtapi.pyby @brandon-b-miller in #773 - fix: fix boolean return type mismatch in C ABI wrapper by @kaeun97 in #770
- Add CUDA FP8 type + conversion bindings (E5M2/E4M3/E8M0), HW-accel detection, and comprehensive tests by @isVoid in #686
- Bump version to 0.27.0 by @gmarkall in #776
Full Changelog: v0.26.0...v0.27.0
v0.26.0
What's Changed
- Eliminate duplicate DWARF entries for boolean kernel parameters by @jiel-nv in #749
- feat: accept
cuda.core.Bufferandcuda.core.utils.StridedMemoryViewas kernel inputs by @cpcloud in #751 - Replace legacy wheels-build.yaml with build-wheel.yml in publish workflow [no-ci] by @kkraus14 in #760
- MatMul test: Move from unittest to fully pytest by @maifeeulasad in #754
- ci: move benchmarks to single function calls so that the units and results are easier to interpret by @cpcloud in #759
- CI cleanup, extend no-ci skip logic, and add GitHub Release uploads by @kkraus14 in #761
- bump pixi version and relock by @cpcloud in #757
- Bump version to 0.26.0 by @kkraus14 in #764
New Contributors
- @maifeeulasad made their first contribution in #754
Full Changelog: v0.25.0...v0.26.0
v0.25.0
What's Changed
- build(deps): bump the actions-monthly group across 1 directory with 8 updates by @dependabot[bot] in #704
- chore(dev): build pixi using rattler by @cpcloud in #713
- [feat] Initial version of the Numba CUDA GDB pretty-printer by @mmason-nvidia in #692
- revert: chore(dev): build pixi using rattler (#713) by @cpcloud in #719
- Fix DISubprogram line number to point to function definition line by @jiel-nv in #695
- chore(deps): regenerate pixi lockfile by @cpcloud in #722
- Disable per-PR nvmath tests + follow same test practice by @leofang in #723
- Adding
pixi run testandpixi run test-parsupport by @rparolin in #724 - CI: Add CUDA 13.1 testing support by @Copilot in #705
- Use
pathfinderfor dynamic libraries by @brandon-b-miller in #308 - ci: remove rapids containers from conda ci by @cpcloud in #737
- Pass the -numba-debug flag to libnvvm by @mmason-nvidia in #681
- Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed by @kkraus14 in #739
- feat: users can pass
shared_memory_carveoutto @cuda.jit by @kaeun97 in #642 - ci: run tests in parallel by @cpcloud in #740
- fix: Fix race condition in CUDA Simulator by @ccam80 in #690
- fix: enable flake8-bugbear lints and fix found problems by @cpcloud in #708
- chore(deps): add cuda-pathfinder to pixi deps by @cpcloud in #741
- Fix: Pass correct flags to linker when debugging in the presence of LTOIR code by @mmason-nvidia in #698
- Fix missing line info in Jupyter notebooks by @jiel-nv in #742
- Fix kernel return type in DISubroutineType debug metadata by @jiel-nv in #745
- Fix prologue debug line info pointing to decorator instead of def line by @jiel-nv in #746
- Fix max block size computation in
forallby @brandon-b-miller in #744 - feat: swap out internal device array usage with
StridedMemoryViewby @cpcloud in #703 - Add Python 3.14 to the wheel publishing matrix by @gmarkall in #750
New Contributors
- @mmason-nvidia made their first contribution in #692
- @ccam80 made their first contribution in #690
Full Changelog: v0.24.0...v0.25.0
v0.24.0
What's Changed
- Set up a new VM-based CI infrastructure by @leofang in #604
- chore(dev-deps): remove ipython and pyinstrument by @cpcloud in #670
- Use
rapidsai/sccachein CI by @trxcllnt in #674 - Remove dangling references to NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY by @brandon-b-miller in #675
- Drop
experimentalfrom cuda.core namespace imports by @brandon-b-miller in #676 - Remove customized address space tracking and address class emission in debug info by @jiel-nv in #669
- Support python
3.14by @brandon-b-miller in #599 - fix: use freethreading-supported
_PySet_NextItemRefwhere possible by @cpcloud in #682 - chore(deps): bump deps in pixi lockfile by @cpcloud in #693
- chore: perf lint by @cpcloud in #697
- perf: let CAI fall through instead of calling from_cuda_array_interface by @cpcloud in #694
- perf: remove some exception control flow and buffer-exception penalization for arrays by @cpcloud in #700
- Fix
test_wheel_deps_wheels.shto actually uninstallnvvmandnvrtcpackages for CUDA 13 by @brandon-b-miller in #701 - Dropping bits in the old CI & Propagating recent changes from cuda-python by @leofang in #683
- chore(deps): bump numba-cuda version and relock pixi by @cpcloud in #707
- ci: remove redundant conda build in ci by @cpcloud in #711
- ci: relock pixi by @cpcloud in #712
- chore: disable
lockedflag to bypass prefix-dev/pixi#5256 by @cpcloud in #714 - Add arch specific target support by @ZzEeKkAa in #549
New Contributors
Full Changelog: v0.23.0...v0.24.0
v0.23.0
What's Changed
- refactor: remove devicearray code to reduce complexity by @cpcloud in #600
- chore: bump version in pixi.toml by @cpcloud in #641
- feat: add set_shared_memory_carveout by @kaeun97 in #629
- Migrate numba-cuda driver to use cuda.core.launch API by @rparolin in #609
- refactor: cull dead linker objects by @cpcloud in #649
- Add support for dependabot by @jpascucci-nv in #647
- Fix false negative NRT link decision when NRT was previously toggled on by @brandon-b-miller in #650
- test: fix bogus
selfargument toContextby @cpcloud in #656 - Only run dependabot monthly and open fewer PRs by @jpascucci-nv in #658
- feat: add print support for int64 tuples by @cpcloud in #663
- Do not manually set DUMP_ASSEMBLY in
nvjitlinktests by @brandon-b-miller in #662 - Test RAPIDS 25.12 by @brandon-b-miller in #661
- build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #652
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 by @dependabot[bot] in #655
- feat: allow printing nested tuples by @cpcloud in #667
- Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo by @gmarkall in #591
- Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected by @gmarkall in #643
- Capture global device arrays in kernels and device functions by @shwina in #666
New Contributors
- @jpascucci-nv made their first contribution in #647
- @dependabot[bot] made their first contribution in #652
- @shwina made their first contribution in #666
Full Changelog: v0.22.1...v0.23.0