Skip to content

Add challenge 70: Merge Sorted Arrays (Medium)#179

Open
claude[bot] wants to merge 1 commit intomainfrom
challenge/70-merge-sorted-arrays
Open

Add challenge 70: Merge Sorted Arrays (Medium)#179
claude[bot] wants to merge 1 commit intomainfrom
challenge/70-merge-sorted-arrays

Conversation

@claude
Copy link
Contributor

@claude claude bot commented Feb 19, 2026

Summary

  • Adds challenge [LeetGPU Challenge Problem] Add 1D RMS Norm #70: Merge Sorted Arrays (Medium difficulty)
  • Given two sorted float32 arrays A (M elements) and B (N elements), produce a single sorted output array C of M+N elements
  • The sequential merge has a serial dependency chain; an efficient GPU solution must use the merge path / co-ranking algorithm — each thread independently binary-searches to find its diagonal on the merge path and merges a local tile without any inter-thread dependencies

Why this is interesting

This challenge exercises GPU concepts that are underrepresented in the current set:

  • Independent work decomposition at the thread/block level for an inherently sequential algorithm
  • Binary search on GPU to compute co-ranks
  • Merge path algorithm (Odeh et al., 2012) — a classic GPGPU primitive
  • Memory access patterns for irregular work distribution

It is not element-wise at all; every output element's value depends on a non-trivial combination of positions in both input arrays.

Test plan

  • 13 functional test cases covering: edge cases (M=1/N=1), all-zero inputs, all-negative inputs, interleaved ranges, non-overlapping ranges, power-of-2 sizes (16, 64/128, 256/512), non-power-of-2 sizes (30/25, 100/73, 255/200), realistic sizes (1K/500, 5K/3K)
  • Performance test: M = N = 10,000,000 (80 MB output, well within 16 GB VRAM × 5)
  • All 6 framework starters present (CUDA, PyTorch, Triton, JAX, CuTe, Mojo)
  • black, isort, flake8 --bugbear, clang-format all pass
  • Challenge loads via importlib (module name + signature verified)

🤖 Generated with Claude Code

Introduces a parallel merge challenge requiring solvers to merge two
sorted float32 arrays into a single sorted output. Unlike a trivial
sequential merge, an efficient GPU implementation must use the merge
path (co-ranking) technique — each thread independently binary-searches
to determine which slice of A and B it is responsible for, then merges
locally without serial dependencies.

Key GPU concepts exercised: independent work decomposition, binary
search on GPU, merge path algorithm, memory access patterns.

Includes 13 functional tests (edge cases, powers-of-2, non-powers,
negative values, non-overlapping ranges, realistic sizes) and a
performance test at M = N = 10,000,000.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kunal-mansukhani
Copy link
Contributor

@claude Rebase and bump this pr to be challenge_id 71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant