Conversation
## Description We should be building packages when commits are merged into the `release/` branches, otherwise projects can get stuck waiting for nightlies. Additionally, some packages like `rapids-dask-dependency` don't get built in the nightly runs. xref: rapidsai/build-planning#224
Author
|
FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
While working on a couple of new things I came across a few issues in the existing benchmark codes. This PR - Fixes a bug that prevented benchmarks being run on GPU only via the `--no_cpu` command-line argument. - Fixes a bug with replicated device names in the generated benchmark tables - Adds a new CUCIM_BENCHMARK_MAX_DURATION environment variable for setting benchmark case duration without modifying the bash scripts - stores any kwargs that were pass to the function in the benchmark table Authors: - Gregory Lee (https://github.com/grlee77) - https://github.com/jakirkham Approvers: - Gigon Bae (https://github.com/gigony) URL: #1002
- Replace strlen() with strnlen() in cuimage.cpp to prevent potential buffer overread if strings are unexpectedly not null-terminated - Add maximum length constraints for spacing_units (256 bytes) and coord_sys (16 bytes) based on expected string sizes - Addresses SonarQube security analysis for safe C string handling Authors: - Gigon Bae (https://github.com/gigony) Approvers: - Gregory Lee (https://github.com/grlee77) URL: #1015
This PR implements batch ROI decoding for cuslide2 using nvImageCodec v0.7.0+'s native batch decoding API
### Background
This approach provides performance improvements by:
- amortizing GPU kernel launch overhead across multiple regions
- enabling parallel decoding of multiple ROIs
- reducing memory allocation overhead through batching
## Changes
### New Files
- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/loader/nvimgcodec_processor.h`
- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/loader/nvimgcodec_processor.cpp`
- `NvImageCodecProcessor` class inheriting from `BatchDataProcessor`
- Integrates with existing `ThreadBatchDataLoader` infrastructure
- Supports both CPU and CUDA output devices
- `python/cucim/tests/unit/clara/test_batch_decoding.py`
- comprehensive test suite with 47 tests
### Modified Files
- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/nvimgcodec/nvimgcodec_decoder.h`
- Added `RoiRegion` and `BatchDecodeResult` structs
- Added `decode_batch_regions_nvimgcodec()` function declaration
- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/nvimgcodec/nvimgcodec_decoder.cpp`
- Implemented `decode_batch_regions_nvimgcodec()` using:
1. `nvimgcodecCodeStreamGetSubCodeStream()` with ROI for each region
2. Single `nvimgcodecDecoderDecode()` call with all streams
3. Batch result processing
- `cpp/plugins/cucim.kit.cuslide2/src/cuslide/tiff/ifd.cpp`
- Updated `IFD::read()` to use `ThreadBatchDataLoader` with `NvImageCodecProcessor`
- Supports `num_workers`, `batch_size`, `prefetch_factor`, `shuffle`, `drop_last` parameters
- `cpp/plugins/cucim.kit.cuslide2/CMakeLists.txt`
- Added new loader source files to build
## Architecture
```
IFD::read()
|
+-- Single Location (location_len=1)
| +-- decode_ifd_region_nvimgcodec()
|
+-- Multiple Locations (location_len>1 or batch_size>1)
+-- ThreadBatchDataLoader + NvImageCodecProcessor
+-- decode_batch_regions_nvimgcodec()
+-- nvimgcodecCodeStreamGetSubCodeStream() x N
+-- nvimgcodecDecoderDecode() (single batch call)
```
## Test Results
All 47 tests passing:
| Test Category | Compression Types | Count | Status |
|---------------|-------------------|-------|--------|
| TestBatchDecoding (CPU) | JPEG, Deflate, Raw | 21 | PASS |
| TestBatchDecodingCUDA | JPEG | 2 | PASS |
| TestBatchDecodingPerformance | JPEG, Deflate, Raw | 24 | PASS |
**Note:** CUDA output is only supported for JPEG compression. Deflate and Raw use CPU decoding with optional GPU memory transfer.
## How to Run Tests
```bash
# Run all batch decoding tests
cd cucim
pytest python/cucim/tests/unit/clara/test_batch_decoding.py -v
# Run specific test categories
pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecoding -v
pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecodingCUDA -v
pytest python/cucim/tests/unit/clara/test_batch_decoding.py::TestBatchDecodingPerformance -v
```
## Example Usage
```python
from cucim import CuImage
import numpy as np
# Open TIFF file
img = CuImage("slide.tiff")
# Batch decode multiple locations
locations = [(0, 0), (256, 256), (512, 512), (768, 768)]
size = (256, 256)
# CPU output with parallel workers
for region in img.read_region(locations, size, level=0, num_workers=4):
arr = np.asarray(region)
print(f"Decoded: {arr.shape}")
# CUDA output (JPEG only)
import cupy as cp
for region in img.read_region(locations, size, level=0, num_workers=4, device="cuda"):
arr = cp.asarray(region)
print(f"GPU decoded: {arr.shape}")
```
Authors:
- https://github.com/cdinea
- https://github.com/jakirkham
Approvers:
- Gregory Lee (https://github.com/grlee77)
- Gigon Bae (https://github.com/gigony)
- https://github.com/jakirkham
URL: #1007
Member
|
Fixing the forward merger in PR: #1019 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Forward-merge triggered by push to release/26.02 that creates a PR to keep main up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.