Skip to content

Resolve forward-merge release/26.02 into main#1019

Merged
jameslamb merged 5 commits intorapidsai:mainfrom
jakirkham:main-merge-release/26.02
Feb 3, 2026
Merged

Resolve forward-merge release/26.02 into main#1019
jameslamb merged 5 commits intorapidsai:mainfrom
jakirkham:main-merge-release/26.02

Conversation

@jakirkham
Copy link
Member

Address merge conflicts found in the bot's forward merger PR: #1010

gforsyth and others added 5 commits January 26, 2026 12:04
## Description
We should be building packages when commits are merged into the
`release/` branches, otherwise projects can get stuck waiting for
nightlies. Additionally, some packages like `rapids-dask-dependency`
don't get built in the nightly runs.

xref: rapidsai/build-planning#224
)

While working on a couple of new things I came across a few issues in the existing benchmark codes. This PR

- Fixes a bug that prevented benchmarks being run on GPU only via the `--no_cpu` command-line argument.
- Fixes a bug with replicated device names in the generated benchmark tables
- Adds a new CUCIM_BENCHMARK_MAX_DURATION environment variable for setting benchmark case duration without modifying the bash scripts
- stores any kwargs that were pass to the function in the benchmark table

Authors:
  - Gregory Lee (https://github.com/grlee77)
  - https://github.com/jakirkham

Approvers:
  - Gigon Bae (https://github.com/gigony)

URL: rapidsai#1002
…idsai#1015)

- Replace strlen() with strnlen() in cuimage.cpp to prevent potential
  buffer overread if strings are unexpectedly not null-terminated
- Add maximum length constraints for spacing_units (256 bytes) and
  coord_sys (16 bytes) based on expected string sizes
- Addresses SonarQube security analysis for safe C string handling

Authors:
  - Gigon Bae (https://github.com/gigony)

Approvers:
  - Gregory Lee (https://github.com/grlee77)

URL: rapidsai#1015
This PR implements batch ROI decoding for cuslide2 using nvImageCodec v0.7.0+'s native batch decoding API

### Background


This approach provides performance improvements by:
- amortizing GPU kernel launch overhead across multiple regions
- enabling parallel decoding of multiple ROIs
- reducing memory allocation overhead through batching

## Changes

### New Files

- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/loader/nvimgcodec_processor.h`
- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/loader/nvimgcodec_processor.cpp`
  - `NvImageCodecProcessor` class inheriting from `BatchDataProcessor`
  - Integrates with existing `ThreadBatchDataLoader` infrastructure
  - Supports both CPU and CUDA output devices

- `python/cucim/tests/unit/clara/test_batch_decoding.py`
  - comprehensive test suite with 47 tests

### Modified Files

- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/nvimgcodec/nvimgcodec_decoder.h`
  - Added `RoiRegion` and `BatchDecodeResult` structs
  - Added `decode_batch_regions_nvimgcodec()` function declaration

- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/nvimgcodec/nvimgcodec_decoder.cpp`
  - Implemented `decode_batch_regions_nvimgcodec()` using:
    1. `nvimgcodecCodeStreamGetSubCodeStream()` with ROI for each region
    2. Single `nvimgcodecDecoderDecode()` call with all streams
    3. Batch result processing

- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/tiff/ifd.cpp`
  - Updated `IFD::read()` to use `ThreadBatchDataLoader` with `NvImageCodecProcessor`
  - Supports `num_workers`, `batch_size`, `prefetch_factor`, `shuffle`, `drop_last` parameters

- `cpp/plugins/cucim.kit.cuslide2/CMakeLists.txt`
  - Added new loader source files to build

## Architecture

```
IFD::read()
    |
    +-- Single Location (location_len=1)
    |   +-- decode_ifd_region_nvimgcodec()
    |
    +-- Multiple Locations (location_len>1 or batch_size>1)
        +-- ThreadBatchDataLoader + NvImageCodecProcessor
            +-- decode_batch_regions_nvimgcodec()
                +-- nvimgcodecCodeStreamGetSubCodeStream() x N
                +-- nvimgcodecDecoderDecode() (single batch call)
```

## Test Results

All 47 tests passing:

| Test Category | Compression Types | Count | Status |
|---------------|-------------------|-------|--------|
| TestBatchDecoding (CPU) | JPEG, Deflate, Raw | 21 | PASS |
| TestBatchDecodingCUDA | JPEG | 2 | PASS |
| TestBatchDecodingPerformance | JPEG, Deflate, Raw | 24 | PASS |

**Note:** CUDA output is only supported for JPEG compression. Deflate and Raw use CPU decoding with optional GPU memory transfer.



## How to Run Tests

```bash
# Run all batch decoding tests
cd cucim
pytest python/cucim/tests/unit/clara/test_batch_decoding.py -v

# Run specific test categories
pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecoding -v
pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecodingCUDA -v
pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecodingPerformance -v
```

## Example Usage

```python
from cucim import CuImage
import numpy as np

# Open TIFF file
img = CuImage("slide.tiff")

# Batch decode multiple locations
locations = [(0, 0), (256, 256), (512, 512), (768, 768)]
size = (256, 256)

# CPU output with parallel workers
for region in img.read_region(locations, size, level=0, num_workers=4):
    arr = np.asarray(region)
    print(f"Decoded: {arr.shape}")

# CUDA output (JPEG only)
import cupy as cp
for region in img.read_region(locations, size, level=0, num_workers=4, device="cuda"):
    arr = cp.asarray(region)
    print(f"GPU decoded: {arr.shape}")
```

Authors:
  - https://github.com/cdinea
  - https://github.com/jakirkham

Approvers:
  - Gregory Lee (https://github.com/grlee77)
  - Gigon Bae (https://github.com/gigony)
  - https://github.com/jakirkham

URL: rapidsai#1007
@jakirkham jakirkham requested review from a team as code owners January 29, 2026 01:41
@jakirkham jakirkham requested a review from bdice January 29, 2026 01:41
@jakirkham jakirkham added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 29, 2026
@jameslamb jameslamb merged commit 2e72a09 into rapidsai:main Feb 3, 2026
58 checks passed
@jakirkham jakirkham deleted the main-merge-release/26.02 branch February 3, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants