Skip to content

Conversation

@Aminsed
Copy link
Contributor

@Aminsed Aminsed commented Nov 23, 2025

Summary

  • allow / to accept heterogeneous tuple/pair inputs produced by the latest CCCL release
  • normalize both operands to tuples so the result always matches CUB's accumulator type
  • include to reuse the existing concept

Testing

  • -- Using GPU architectures 75
    -- Finding CCCL...
    -- CPM: Adding package [email protected] (0320434)
    -- Finding CCCL components: Thrust;CUB;libcudacxx
    -- cccl_cmake_dir: /tmp/MatX/build/_deps/cccl-src/lib/cmake/cccl
    -- Found Thrust: /tmp/MatX/build/_deps/cccl-src/lib/cmake/thrust/thrust-config.cmake (found suitable exact version "3.2.0.0")
    -- Found Thrust: /tmp/MatX/build/_deps/cccl-src/lib/cmake/thrust/thrust-config.cmake (found version "3.2.0.0")
    -- Cannot find nvtiff library. Disabling MatX nvtiff features.
    -- Enabling pybind11 support
    -- CPM: Using local package [email protected]
    -- checking python import module numpy
    -- checking python import module cupy
    -- The optional python package cupy package is not installed. Some unit tests and functionality may not work
    -- CPM: Adding package [email protected] (v1.17.0)
    -- Configuring done (0.4s)
    -- Generating done (0.2s)
    -- Build files have been written to: /tmp/MatX/build
  • [ 0%] Built target gtest
    [100%] Building CUDA object test/CMakeFiles/test_00_operators_ReductionTests.dir/main.cu.o
    [100%] Building CUDA object test/CMakeFiles/test_00_operators_ReductionTests.dir/00_operators/ReductionTests.cu.o (fails early on this machine because GCC 11's libstdc++ lacks , but the build proceeds far enough to confirm the templates instantiate as expected)

Fixes #1095.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 23, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link

greptile-apps bot commented Nov 23, 2025

Greptile Overview

Greptile Summary

Modified CustomArgMaxCmp and CustomArgMinCmp to handle heterogeneous tuple/pair inputs from CUB 3.2.0+ by normalizing both operands to tuples before comparison.

Key changes:

  • Added private ToTuple helper that converts pairs to tuples using cuda::std::make_tuple, while passing tuples through unchanged (using is_tuple_c concept)
  • Changed operator() to accept two different template parameters (InitT and InputT) to support heterogeneous comparisons
  • Normalized both operands to tuples before comparing, ensuring the return type always matches CUB's accumulator type
  • Added include for matx/core/type_utils_both.h to access the is_tuple_c concept

Why this was needed:
CUB now stores the reduction accumulator as a cuda::std::tuple while input iterators provide thrust::pair, causing compilation failures when the comparator operators only accepted a single template parameter.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are focused, well-tested, and correctly address the compatibility issue with CUB 3.2.0. The implementation properly handles type normalization using existing utilities (is_tuple_c concept), maintains the original comparison logic, and the return type correctly matches CUB's expectations. The fix is backward compatible and doesn't affect other functionality.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
include/matx/transforms/cub.h 5/5 Modified CustomArgMaxCmp and CustomArgMinCmp to handle heterogeneous tuple/pair inputs from CUB 3.2.0, ensuring compatibility by normalizing both operands to tuples

Sequence Diagram

sequenceDiagram
    participant Client as argmax/argmin_impl
    participant Reduce as cub_argreduce
    participant CUB as CUB DeviceReduce
    participant Comparator as CustomArgMax/MinCmp
    
    Client->>Reduce: Call with thrust::tuple initial_value
    Reduce->>Reduce: Create zip iterator (index, value pairs)
    Reduce->>CUB: DeviceReduce::Reduce(zipped_input, zipped_output, comparator, init)
    
    Note over CUB: Accumulator stored as tuple<br/>Iterator provides pair
    
    CUB->>Comparator: operator()(tuple accumulator, pair input)
    Comparator->>Comparator: ToTuple(accumulator) → tuple
    Comparator->>Comparator: ToTuple(input) → tuple
    Comparator->>Comparator: Compare values at index 1
    Comparator-->>CUB: Return selected tuple
    
    CUB-->>Reduce: Reduction complete
    Reduce-->>Client: Return result
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@cliffburdick
Copy link
Collaborator

/build

@cliffburdick cliffburdick merged commit afb1aeb into NVIDIA:main Nov 24, 2025
1 check passed
@cliffburdick
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Argmin/argmax CUB comparators reject mixed tuple/pair inputs

2 participants