Forward-merge release/26.02 into main by rapids-bot[bot] · Pull Request #1010 · rapidsai/cucim

rapids-bot · 2026-01-26T19:04:18Z

Forward-merge triggered by push to release/26.02 that creates a PR to keep main up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

## Description We should be building packages when commits are merged into the `release/` branches, otherwise projects can get stuck waiting for nightlies. Additionally, some packages like `rapids-dask-dependency` don't get built in the nightly runs. xref: rapidsai/build-planning#224

rapids-bot · 2026-01-26T19:04:21Z

FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the Resolve conflicts option in this PR, follow these instructions https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

While working on a couple of new things I came across a few issues in the existing benchmark codes. This PR - Fixes a bug that prevented benchmarks being run on GPU only via the `--no_cpu` command-line argument. - Fixes a bug with replicated device names in the generated benchmark tables - Adds a new CUCIM_BENCHMARK_MAX_DURATION environment variable for setting benchmark case duration without modifying the bash scripts - stores any kwargs that were pass to the function in the benchmark table Authors: - Gregory Lee (https://github.com/grlee77) - https://github.com/jakirkham Approvers: - Gigon Bae (https://github.com/gigony) URL: #1002

- Replace strlen() with strnlen() in cuimage.cpp to prevent potential buffer overread if strings are unexpectedly not null-terminated - Add maximum length constraints for spacing_units (256 bytes) and coord_sys (16 bytes) based on expected string sizes - Addresses SonarQube security analysis for safe C string handling Authors: - Gigon Bae (https://github.com/gigony) Approvers: - Gregory Lee (https://github.com/grlee77) URL: #1015

This PR implements batch ROI decoding for cuslide2 using nvImageCodec v0.7.0+'s native batch decoding API ### Background This approach provides performance improvements by: - amortizing GPU kernel launch overhead across multiple regions - enabling parallel decoding of multiple ROIs - reducing memory allocation overhead through batching ## Changes ### New Files - `cpp/plugins/cucim.kit.cuslide2/src/cuslide/loader/nvimgcodec_processor.h` - `cpp/plugins/cucim.kit.cuslide2/src/cuslide/loader/nvimgcodec_processor.cpp` - `NvImageCodecProcessor` class inheriting from `BatchDataProcessor` - Integrates with existing `ThreadBatchDataLoader` infrastructure - Supports both CPU and CUDA output devices - `python/cucim/tests/unit/clara/test_batch_decoding.py` - comprehensive test suite with 47 tests ### Modified Files - `cpp/plugins/cucim.kit.cuslide2/src/cuslide/nvimgcodec/nvimgcodec_decoder.h` - Added `RoiRegion` and `BatchDecodeResult` structs - Added `decode_batch_regions_nvimgcodec()` function declaration - `cpp/plugins/cucim.kit.cuslide2/src/cuslide/nvimgcodec/nvimgcodec_decoder.cpp` - Implemented `decode_batch_regions_nvimgcodec()` using: 1. `nvimgcodecCodeStreamGetSubCodeStream()` with ROI for each region 2. Single `nvimgcodecDecoderDecode()` call with all streams 3. Batch result processing - `cpp/plugins/cucim.kit.cuslide2/src/cuslide/tiff/ifd.cpp` - Updated `IFD::read()` to use `ThreadBatchDataLoader` with `NvImageCodecProcessor` - Supports `num_workers`, `batch_size`, `prefetch_factor`, `shuffle`, `drop_last` parameters - `cpp/plugins/cucim.kit.cuslide2/CMakeLists.txt` - Added new loader source files to build ## Architecture ``` IFD::read() | +-- Single Location (location_len=1) | +-- decode_ifd_region_nvimgcodec() | +-- Multiple Locations (location_len>1 or batch_size>1) +-- ThreadBatchDataLoader + NvImageCodecProcessor +-- decode_batch_regions_nvimgcodec() +-- nvimgcodecCodeStreamGetSubCodeStream() x N +-- nvimgcodecDecoderDecode() (single batch call) ``` ## Test Results All 47 tests passing: | Test Category | Compression Types | Count | Status | |---------------|-------------------|-------|--------| | TestBatchDecoding (CPU) | JPEG, Deflate, Raw | 21 | PASS | | TestBatchDecodingCUDA | JPEG | 2 | PASS | | TestBatchDecodingPerformance | JPEG, Deflate, Raw | 24 | PASS | **Note:** CUDA output is only supported for JPEG compression. Deflate and Raw use CPU decoding with optional GPU memory transfer. ## How to Run Tests ```bash # Run all batch decoding tests cd cucim pytest python/cucim/tests/unit/clara/test_batch_decoding.py -v # Run specific test categories pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecoding -v pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecodingCUDA -v pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecodingPerformance -v ``` ## Example Usage ```python from cucim import CuImage import numpy as np # Open TIFF file img = CuImage("slide.tiff") # Batch decode multiple locations locations = [(0, 0), (256, 256), (512, 512), (768, 768)] size = (256, 256) # CPU output with parallel workers for region in img.read_region(locations, size, level=0, num_workers=4): arr = np.asarray(region) print(f"Decoded: {arr.shape}") # CUDA output (JPEG only) import cupy as cp for region in img.read_region(locations, size, level=0, num_workers=4, device="cuda"): arr = cp.asarray(region) print(f"GPU decoded: {arr.shape}") ``` Authors: - https://github.com/cdinea - https://github.com/jakirkham Approvers: - Gregory Lee (https://github.com/grlee77) - Gigon Bae (https://github.com/gigony) - https://github.com/jakirkham URL: #1007

jakirkham · 2026-01-29T01:43:04Z

Fixing the forward merger in PR: #1019

rapids-bot bot requested a review from a team as a code owner January 26, 2026 19:04

rapids-bot bot requested a review from gforsyth January 26, 2026 19:04

grlee77 and others added 2 commits January 27, 2026 15:21

rapids-bot bot requested a review from a team as a code owner January 28, 2026 16:47

rapids-bot bot requested review from a team as code owners January 29, 2026 01:30

jakirkham mentioned this pull request Jan 29, 2026

Resolve forward-merge release/26.02 into main #1019

Merged

jameslamb merged commit 3fe2eed into main Feb 3, 2026
420 of 430 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward-merge release/26.02 into main#1010

Forward-merge release/26.02 into main#1010
jameslamb merged 4 commits intomainfrom
release/26.02

rapids-bot bot commented Jan 26, 2026

Uh oh!

rapids-bot bot commented Jan 26, 2026

Uh oh!

jakirkham commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

rapids-bot bot commented Jan 26, 2026

Uh oh!

rapids-bot bot commented Jan 26, 2026

Uh oh!

jakirkham commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants