Skip to content

Conversation

@cliffburdick
Copy link
Collaborator

@cliffburdick cliffburdick commented Nov 17, 2025

Converted most of the element-wise operators to have JIT support by adding all the capabilities and strings. ND operators still not working yet and need some investigation.

Other changes

  • Split up large unit test file that was taking long to compile
  • Added GLOBAL_KERNEL capability that says whether it can operate as separate CTAs or not
  • Added JIT-enabled tests when MATX_EN_JIT is enabled along with the parameter sweeps. This can take a very long time and we'll have to figure out when to enable this.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick
Copy link
Collaborator Author

/build

@greptile-apps
Copy link

greptile-apps bot commented Nov 17, 2025

Greptile Summary

  • Added JIT compilation support to most element-wise operators by implementing SUPPORTS_JIT capability queries and JIT string generation
  • Introduced GLOBAL_KERNEL capability to distinguish between global-level and block-level kernel execution patterns
  • Split large test file operator_func_test.cu into 8 smaller files by category to reduce compilation time

Confidence Score: 3/5

  • This PR has a critical loop bound bug that will cause incorrect execution but is otherwise well-structured
  • Score reflects one critical logical error in jit_kernel.h that must be fixed before merging (loop bound in matxOpT4StrideKernelBlock), but the overall JIT infrastructure design is sound
  • Pay close attention to include/matx/executors/jit_kernel.h - contains a critical loop bound bug at line 274

Important Files Changed

Filename Overview
include/matx/core/capabilities.h Added GLOBAL_KERNEL capability enum and attributes, changed SUPPORTS_JIT default from true to false, added JIT support check for non-MatX operators
include/matx/executors/jit_kernel.h Split kernels into Block and global variants, updated to use CurrentCapabilities template parameter, contains loop bound bug in matxOpT4StrideKernelBlock (line 274)
include/matx/executors/jit_cuda.h Added logic to check GLOBAL_KERNEL capability and select appropriate kernel variants, integrated new JIT infrastructure with executor

Sequence Diagram

sequenceDiagram
    participant User
    participant CUDAJITExecutor
    participant CapabilitySystem
    participant KernelProvider
    participant NVRTC
    participant GPU

    User->>CUDAJITExecutor: Exec(op)
    CUDAJITExecutor->>CapabilitySystem: Check SUPPORTS_JIT capability
    CapabilitySystem-->>CUDAJITExecutor: JIT supported
    CUDAJITExecutor->>CapabilitySystem: Query GLOBAL_KERNEL capability
    CapabilitySystem-->>CUDAJITExecutor: global_kernel flag
    CUDAJITExecutor->>KernelProvider: create_kernel_provider(sizes, jit=true, global_kernel)
    CUDAJITExecutor->>KernelProvider: find_best_launch_params(op, kernel_provider)
    KernelProvider-->>CUDAJITExecutor: ept, shm_size, block_size, groups_per_block
    CUDAJITExecutor->>CUDAJITExecutor: get_grid_dims(blocks, threads, sizes, ept)
    CUDAJITExecutor->>NVRTC: nvrtc_compile_and_run(op, sizes, blocks, threads, ept, stride, shm_size)
    NVRTC->>GPU: Launch JIT-compiled kernel
    GPU-->>User: Results
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

91 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

92 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@cliffburdick
Copy link
Collaborator Author

91 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile React with 👍 or 👎 to share your feedback on this new summary format

Critical loop bug fixed. Please look again

@greptile-apps
Copy link

greptile-apps bot commented Nov 18, 2025

Skipped: This PR changes more files than the configured file change limit: (105 files found, 100 file limit)

@cliffburdick
Copy link
Collaborator Author

/build

@cliffburdick
Copy link
Collaborator Author

/build

@cliffburdick cliffburdick force-pushed the jit_op_unit_tests branch 3 times, most recently from 62bdbb4 to dc08be7 Compare November 24, 2025 22:04
@cliffburdick
Copy link
Collaborator Author

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants